Spark SQL Query

The Spark SQL Query processor runs a Spark SQL query to transform batches of data. To perform record-level calculations using Spark SQL expressions, use the Spark SQL Expression processor.

For each batch of data, the processor receives a single Spark DataFrame as input and registers the input DataFrame as a temporary table in Spark. The processor then runs a Spark SQL query to transform the temporary table, and then returns a new DataFrame as output.

When you configure the processor, you define the Spark SQL query that the processor runs. The Spark SQL query can include Spark SQL and a subset of the functions provided with the StreamSets expression language.

Tip: In streaming pipelines, you can use a Window processor upstream from this processor to generate larger batch sizes for evaluation. For example, to include one of the Spark SQL window functions such as rank in the query, you'd first want to use a Window processor before the Spark SQL Query processor.