What is the StreamSets Platform?

The StreamSets platform is a cloud-native platform for building, running, and monitoring data pipelines.

A pipeline describes the flow of data from origin to destination systems and defines how to process the data along the way. Pipelines can access multiple types of external systems, including cloud data lakes, cloud data warehouses, and storage systems installed on-premises such as relational databases.

As a pipeline runs, you can view real-time statistics and error information about the data as it flows from origin to destination systems.

The StreamSets platform uses the following components to manage your pipelines:
StreamSets Control Hub
StreamSets Control Hub is a public cloud service hosted by StreamSets, which you access using a web browser. Use Control Hub to build, manage, and monitor your pipelines.
StreamSets engines
StreamSets engines reside in your corporate network, which can be on-premises or on a protected cloud computing platform. The engines function as headless engines without a UI. StreamSets has two data plane engines, Data Collector and Transformer. Both engines can be deployed independently, but managed together in Control Hub.
Use a Data Collector engine to run data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
Use a Transformer engine to run data processing pipelines on Apache Spark. Because Transformer pipelines run on Spark deployed on a cluster, the pipelines can perform set-based transformations such as joins, aggregates, and sorts on the entire data set.
When you start pipelines from Control Hub, both engines connect to external origin systems, process the data, and move it to the destination systems. The engines send status updates and metrics about the running pipelines back to Control Hub so that you can monitor the pipeline progress in real time.

The following image provides an overview of the StreamSets platform components: