Full Installation and Launch (Service Start)
Users with a StreamSets enterprise account can install the full Data Collector and run Data Collector as a service.
To install the full Data Collector as a service, you can download the Data Collector RPM package or the Data Collector full tarball.
You can run Data Collector as a service on all supported Linux operating systems.
Installing from the RPM Package
Users with a StreamSets enterprise account can install the Data Collector RPM package and start it as a service on CentOS, Oracle Linux, or Red Hat Enterprise Linux.
The default system user and group are named sdc
. If an
sdc
user and an sdc
group do not exist on the
machine, the installation creates the user and group for you and assigns them the
next available user ID and group ID.
sdc
user and group, create the user and group before installation
and specify the IDs that you want to use. For example, if you’re installing Data Collector on
multiple machines, you might want to create the system user and group before
installation to ensure that the user ID and group ID are consistent across the
machines.Installing the full Data Collector as a service requires root privileges.
- Access the Data Collector RPM package from the StreamSets Support portal.
-
Download the RPM package for your operating system:
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, download the RPM EL6 package.
- For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, download the RPM EL7 package.
- For Oracle Linux 8 or Red Hat Enterprise Linux 8, download the RPM EL8 package.
-
Use the following command to extract the file to the desired location:
tar xf streamsets-datacollector-<version>-<operating_system>-all-rpms.tar
For example, to extract version 5.11.0 on CentOS 7, use the following command:tar xf streamsets-datacollector-5.11.0-el7-all-rpms.tar
-
Use the following command to install the full Data Collector RPM package:
yum localinstall streamsets*.rpm
-
To start Data Collector as a service, use the required command for your operating system:
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux
6, use:
service sdc start
- For later operating systems,
use:
systemctl start sdc
- For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux
6, use:
-
To access the Data Collector UI, enter the following URL
in the address bar of your browser:
http://<hostname>:18630/
Installing from the Tarball for Systems Using SysV Init
Users with a StreamSets enterprise account can install the Data Collector tarball and start it as a service for supported operating systems that use the SysV init system. Supported operating systems include CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS.
For tarball installation instructions for operating systems that use the systemd init system, see Installing from the Tarball for Systems Using Systemd Init.
This procedure walks through setting the default directories and the default system user and group used to start Data Collector as a service. Before you install, you can alternatively use the $SDC_DIST/libexec/sdcd-env.sh file to modify the environment variables that define directories and the system user and group.Installing the full Data Collector as a service requires root privileges.
- Download the Data Collector full tarball from the StreamSets Support portal.
-
Use the following command to extract the tarball to the desired location,
typically /opt/local/:
tar xf streamsets-datacollector-all-<version>.tgz -C <extraction directory>
For example, to extract version 5.11.0, use the following command:tar xf streamsets-datacollector-all-5.11.0.tgz -C /opt/local
-
Create a system user and group named
sdc
.Thesdc
user and group are used to start Data Collector as a service. -
Use the following command from the directory where you extracted the tarball to
copy initd/_sdcinitd_prototype to the
/etc/init.d directory:
cp initd/_sdcinitd_prototype /etc/init.d/sdc
-
Use the following command to change ownership of the file to sdc:
chown sdc:sdc /etc/init.d/sdc
- Edit the /etc/init.d/sdc file and set the SDC_DIST and SDC_HOME environment variables to the location where you extracted the tarball.
-
Use the following command to make the
sdc
file executable:chmod 755 /etc/init.d/sdc
-
Use the following command to create the Data Collector configuration directory at /etc/sdc:
mkdir /etc/sdc
-
Use the following command from the directory where you extracted the tarball to
copy all files from
etc
into the Data Collector configuration directory that you just created:cp -R etc/* /etc/sdc
-
Use the following command to change the owner of the
/etc/sdc directory and all files in the directory to
sdc:sdc
:chown -R sdc:sdc /etc/sdc
-
Use the following command to set owner only permission on the
form-realm.properties file in the
/etc/sdc directory:
chmod go-rwx /etc/sdc/form-realm.properties
-
Use the following commands to create the Data Collector log directory at /var/log/sdc and change the owner to
sdc:sdc
:mkdir /var/log/sdc chown sdc:sdc /var/log/sdc
-
Use the following commands to create the Data Collector data directory at /var/lib/sdc and change the owner to
sdc:sdc
:mkdir /var/lib/sdc chown sdc:sdc /var/lib/sdc
-
Use the following commands to create the Data Collector resources directory at /var/lib/sdc-resources and change
the owner to
sdc:sdc
:mkdir /var/lib/sdc-resources chown sdc:sdc /var/lib/sdc-resources
-
Use the following command to start Data Collector as a service:
service sdc start
-
To add the Data Collector service to the system startup, use the required command for your operating
system.
- For CentOS, use the following
command:
chkconfig --add sdc
- For Ubuntu, use the following
command:
update-rc.d sdc defaults 97 03
- For CentOS, use the following
command:
-
To access the Data Collector UI, enter the following URL in the address bar of your browser:
http://<hostname>:18630/
Installing from the Tarball for Systems Using Systemd Init
Users with a StreamSets enterprise account can install the Data Collector tarball and start it as a service for supported operating systems that use the systemd init system. Supported operating systems include CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS.
For tarball installation instructions for operating systems that use the SysV init system, see Installing from the Tarball for Systems Using SysV Init.
This procedure walks through setting the default directories and the default system user and group used to start Data Collector as a service. Before you install, you can alternatively use the $SDC_DIST/systemd/sdc.service file to modify the environment variables that define directories and the system user and group.Installing the full Data Collector as a service requires root privileges.
- Download the Data Collector full tarball from the StreamSets Support portal.
-
Use the following command to extract the tarball to the desired location,
typically /opt/streamsets-datacollector/:
tar xf streamsets-datacollector-all-<version>.tgz -C <extraction directory>
For example, to extract version 5.11.0, use the following command:tar xf streamsets-datacollector-all-5.11.0.tgz -C /opt/streamsets-datacollector
-
Use the following command from the directory where you extracted the tarball to
copy systemd/sdc.service to the
/etc/systemd/system directory:
cp systemd/sdc.service /etc/systemd/system/sdc.service
-
If you did not extract the tarball to the default directory
/opt/streamsets-datacollector/, override the
/etc/systemd/system/sdc.service
file to modify the SDC_HOME and ExecStart values.Override the default values in the
sdc.service
file using the same procedure that you use to override unit configuration files on a systemd init system. For an example, see "Example 2. Overriding vendor settings" in this systemd.unit manpage. -
Use the following command from the directory where you extracted the tarball to
copy systemd/sdc.socket to the
/etc/systemd/system directory:
cp systemd/sdc.socket /etc/systemd/system/sdc.socket
- Optionally, edit the /etc/systemd/system/sdc.socket file to modify the Data Collector port number. The port must match the one defined in sdc.properties. Default is 18630.
-
Create a system user and group named sdc.
For example, use the following command to create a system user and group with the next available group ID and user ID:
groupadd -r sdc && useradd -r -d <installation dir> -g sdc -s /sbin/nologin sdc
If you’re installing Data Collector on multiple machines, we recommend explicitly specifying a group ID and user ID to ensure that the IDs are consistent across the machines. Use the -g and -u flags respectively to specify the ID.
-
Use the following command to reload the systemd manager configuration:
systemctl daemon-reload
-
Use the following command to create the Data Collector configuration directory at /etc/sdc:
mkdir /etc/sdc
-
Use the following command from the directory where you extracted the tarball to
copy all files from etc into the Data Collector configuration directory that you just created:
cp -R etc/* /etc/sdc
-
Use the following command to change the owner of the
/etc/sdc directory and all files in the directory to
sdc:sdc:
chown -R sdc:sdc /etc/sdc
-
Use the following commands to create the Data Collector log directory at /var/log/sdc and change the owner to
sdc:sdc:
mkdir /var/log/sdc chown sdc:sdc /var/log/sdc
-
Use the following commands to create the Data Collector data directory at /var/lib/sdc and change the owner to
sdc:sdc:
mkdir /var/lib/sdc chown sdc:sdc /var/lib/sdc
-
Use the following commands to create the Data Collector resources directory at /var/lib/sdc-resources and change
the owner to sdc:sdc:
mkdir /var/lib/sdc-resources chown sdc:sdc /var/lib/sdc-resources
-
Use the following command to start Data Collector as a service:
systemctl start sdc
-
To add the Data Collector service to the system startup, use the
following command:
systemctl enable sdc
-
To access the Data Collector UI, enter the following URL
in the address bar of your browser:
http://<hostname>:18630/