What is StreamSets Transformer?StreamSets TransformerTM is an execution engine that runs data processing pipelines on Apache Spark, an open-source cluster-computing framework. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform transformations that require heavy processing on the entire data set in batch or streaming mode.
Pipeline Processing on SparkTransformer functions as a Spark client that launches distributed Spark applications.
Batch Case StudyTransformer can run pipelines in batch mode. A batch pipeline processes all available data in a single batch, and then stops.
Streaming Case StudyTransformer can run pipelines in streaming mode. A streaming pipeline maintains connections to origin systems and processes data at user-defined intervals. The pipeline runs continuously until you manually stop it.
Transformer for Data Collector UsersFor users already familiar with StreamSets Data Collector pipelines, here's how Transformer pipelines are similar... and different.
Tutorials and Sample PipelinesStreamSets provides tutorials and sample pipelines to help you learn about using Transformer.
OverviewYou can preview data to help build or fine-tune a pipeline. You can preview complete or incomplete pipelines.
Preview CodesIn Preview mode, Transformer displays different colors for different types of data. Transformer uses other codes and formatting to highlight changed fields.
Processor Output OrderWhen previewing data for a processor, you can preview both the input and the output data. You can display the output records in the order that matches the input records or in the order produced by the processor.
Input and Output Schema for StagesAfter running preview for a pipeline, you can view the input and output schema for each stage on the Schema tab in the pipeline properties panel. The schema includes each field path and data type.
Editing PropertiesWhen running preview, you can edit stage properties to see how the changes affect preview data. For example, you might edit the condition in a Stream Selector processor to see how the condition alters which records pass to the different output streams.