Scala

The Scala processor runs custom Scala code to transform data. You develop the custom code using the Spark APIs for the version of Spark installed on your cluster. Complete the prerequisite tasks before using the processor in a pipeline.

The processor can have one or more input streams and a single output stream.

The Scala processor receives a Spark DataFrame from each input stream, runs your custom Scala code to transform the DataFrames, and then returns a single DataFrame as output. The processor does not run the code on empty DataFrames, by default.

When you configure the Scala processor, you specify the code to run and whether to run the code on empty DataFrames.