Installation Overview

To set up and deploy a Data Collector engine in your corporate network, you create environments and deployments in Control Hub.

When using self-managed environments and deployments, you take full control of procuring the resources needed to run the engine. You must set up the machines and complete the installation prerequisites required by the engine.

When using one of the cloud service provider integrations that StreamSets provides, such as AWS and GCP environments and deployments, Control Hub automatically provisions the resources needed to run the engine in your cloud service provider account. Control Hub then deploys engine instances to those resources. In this case, you do not need to complete the Data Collector installation prerequisites.

A Control Hub deployment defines the stage libraries that are installed on all engine instances belonging to the deployment. When you create any deployment type, you select the stage libraries to install on the engine.

Data Collector supports working with a wide range of external systems. Supported Systems and Versions lists the systems that Data Collector supports and tests, and the stages that work with those systems.

Note: When running Data Collector on a MapR cluster, you must perform additional prerequisite steps.