Spark

The Spark executor starts a Spark application each time it receives an event. You can use the Spark executor with Spark on YARN. The executor is not compatible with Spark on Mesos at this time. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Use the Spark executor to start a Spark application as part of an event stream. You can use the executor in any logical way, such as running Spark applications after the Hadoop FS, MapR FS, or Amazon S3 destination closes files. For example, you might use the executor to start a Spark application that converts Avro files to Parquet each time the Hadoop FS destination closes a file.

Note that the Spark executor starts an application in an external system. It does not monitor the application or wait for it to complete. The executor becomes available for additional processing as soon as it successfully submits an application.

The Spark executor can run the application in client or cluster mode. Run the application in client mode only when resource use is not a concern.

Before you use the Spark executor, make sure to perform the prerequisite task.

When you configure the Spark executor, you can specify the number of worker nodes Spark should use, or you can enable dynamic allocation and specify the minimum and maximum number of worker nodes. Dynamic allocation allows Spark to use additional worker nodes as needed, within the specified range.

You can specify additional cluster manager properties to pass to Spark, such as the maximum amount of memory that the application driver and executor can use.

You can also configure additional Spark arguments and environment variables. Any arguments and variables that you enter override any previous definitions, including those in the Spark application, elsewhere in the Spark executor, and the Data Collector machine.

You can specify custom Spark and Java home directories, and a Hadoop proxy user. You can also enter Kerberos credentials if needed.

When you configure the application details, you specify the language used to write the application and then define language-specific properties.

You can also configure the executor to generate events for another event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.