What is IBM StreamSets?
IBM StreamSets is a cloud-native platform for building, running, and monitoring data pipelines.
A pipeline describes the flow of data from origin to destination systems and defines how to process the data along the way. Pipelines can access multiple types of external systems, including cloud data lakes, cloud data warehouses, and storage systems installed on-premises such as relational databases.
As a pipeline runs, you can view real-time statistics and error information about the data as it flows from origin to destination systems.
IBM StreamSets uses the following components to manage your pipelines:
- Control Hub
- Control Hub is a public cloud service that you access using a web browser. Use Control Hub to build, manage, and monitor your pipelines.
- Data Collector
- Data Collector is an engine that processes data. Use the engine to run data ingestion pipelines that can read from and write to a large number of heterogeneous origins and destinations. Data Collector pipelines perform record-based data transformations in streaming, CDC, or batch modes.
Note: At times, this documentation uses "StreamSets" to refer to "IBM
StreamSets".
The following image provides a general overview of the IBM StreamSets components: