Core Installation

Users with a StreamSets enterprise account can use the Data Collector core installation.

The core installation is a minimal installation that generally requires installing additional stage libraries to develop pipelines. The core installation allows Data Collector to use less disk space.

To use the Data Collector core installation, you can download the RPM package or the core tarball.

The core installation includes Data Collector and the following stage libraries:
  • Basic stage library
  • Data formats stage library
  • Development stage library
  • Statistics stage library
  • Windows stage library

You then use the Data Collector UI or the command line interface to install additional stage libraries.

The core installation includes all development stages and the following stages:
Origins
  • CoAP Server
  • Directory
  • File Tail
  • gRPC Client
  • HTTP Client
  • HTTP Server
  • JavaScript Scripting
  • MQTT Subscriber
  • NiFi HTTP Server
  • OPC UA Client
  • REST Service
  • SDC RPC
  • SFTP/FTP/FTPS Client
  • System Metrics
  • TCP Server
  • UDP Multithreaded Source
  • UDP Source
  • WebSocket Client
  • WebSocket Server
  • Windows Event Log
Processors
  • Base64 Field Decoder
  • Base64 Field Encoder
  • Data Generator
  • Data Parser
  • Delay
  • Expression Evaluator
  • Field Flattener
  • Field Hasher
  • Field Mapper
  • Field Masker
  • Field Merger
  • Field Order
  • Field Pivoter
  • Field Remover
  • Field Renamer
  • Field Replacer
  • Field Splitter
  • Field Type Converter
  • Field Zip
  • Geo IP
  • HTTP Client
  • HTTP Router
  • JavaScript Evaluator
  • JSON Generator
  • JSON Parser
  • Log Parser
  • Record Deduplicator
  • Schema Generator
  • Static Lookup
  • Stream Selector
  • Value Replacer
  • Windowing Aggregator
  • XML Flattener
  • XML Parser
Destinations
  • CoAP Client
  • HTTP Client
  • Local FS
  • MQTT Publisher
  • Named Pipe
  • SDC RPC
  • Send Response to Origin
  • SFTP/FTP/FTPS Client
  • Splunk
  • Syslog
  • To Error
  • Trash
  • WebSocket Client
Executors
  • Databricks Job Launcher
  • Email
  • Pipeline Finisher
  • Shell

Installing the Core RPM Package

Users with a StreamSets enterprise account can install the Data Collector RPM package and start it as a service on CentOS or Red Hat Enterprise Linux. To install the core version of Data Collector, download the RPM package.

After you perform the core installation and launch, install individual stage libraries as needed.

When you install from the RPM package, Data Collector uses the default directories and the default system user and group.

Note: StreamSets does not recommend using NFS or NAS to store Data Collector files.

The default system user and group are named sdc. If an sdc user and an sdc group do not exist on the machine, the installation creates the user and group for you and assigns them the next available user ID and group ID.

To use specific IDs for the sdc user and group, create the user and group before installation and specify the IDs that you want to use. For example, if you’re installing Data Collector on multiple machines, you might want to create the system user and group before installation to ensure that the user ID and group ID are consistent across the machines.
  1. Access the Data Collector RPM package from the StreamSets Support portal.
  2. Download the RPM package for your operating system:
    • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
    • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
  3. Use the following command to extract the file to the desired location:
    tar xf streamsets-datacollector-<version>-<operating_system>-all-rpms.tar
    For example, to extract version 5.10.0 on CentOS 7, use the following command:
    tar xf streamsets-datacollector-5.10.0-el7-all-rpms.tar
  4. Use the following command to install the core RPM package:
    yum localinstall streamsets-datacollector-<version>-1.noarch.rpm
    For example, to install version 5.10.0, use the following command:
    yum localinstall streamsets-datacollector-5.10.0-1.noarch.rpm
  5. To start Data Collector as a service, use the required command for your operating system:
    • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
      service sdc start
    • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
      systemctl start sdc
  6. To access the Data Collector UI, enter the following URL in the address bar of your browser:
    http://<hostname>:18630/

Installing the Core Tarball

Users with a StreamSets enterprise account can install the Data Collector core tarball.

To install the core version of Data Collector, download the core tarball. After you perform the core installation and launch, install individual stage libraries as needed.

  1. Download the core Data Collector tarball from the StreamSets Support portal.
  2. Use one of the following installation methods to install the core Data Collector: