Enabling HTTPS

To secure communication within Data Collector, enable HTTPS for the following components:
Data Collector
Enable HTTPS for Data Collector to secure the communication to the REST API and to use the Data Collector as an authoring Data Collector in Control Hub.
For Control Hub cloud, an authoring Data Collector must use the HTTPS protocol because Control Hub cloud also uses the HTTPS protocol. For Control Hub on-premises, an authoring Data Collector must use the same protocol as the Control Hub on-premises installation.
Pipeline stages that connect to external systems
During pipeline development, developers can enable specific stages to use SSL/TLS to secure the communication with an external system. For example, if designing a pipeline that writes to a Cassandra cluster enabled for HTTPS, the developer must configure the Cassandra destination to use SSL/TLS to connect to Cassandra.
For information about enabling HTTPS for pipeline stages, see SSL/TLS Encryption.

By default, Data Collector uses the HTTP protocol. StreamSets recommends using HTTPS in a production environment. HTTPS requires SSL/TLS certificates.

Prerequisites

Before you enable HTTPS for Data Collector, complete the following requirements:

Obtain access to OpenSSL and Java keytool
If you do not have keystore files that include SSL/TLS certificates signed by a certificate authority (CA), you can request certificates and create the keystore files using the following tools:
  • OpenSSL - Use OpenSSL to create a Certificate Signing Request (CSR) that you send to the CA of your choice, as well as to create the keystore files. For more information, see the OpenSSL documentation.
  • Java keytool - You can also use Java keytool to create a CSR and to create keystore files. Java keytool is part of the Java Development Kit (JDK). For more information, see the keytool documentation.
Generate SSL/TLS certificate and private key pairs signed by a certificate authority (CA)
To enable HTTPS for Data Collector, generate a single private key and public certificate pair for Data Collector. Data Collector provides a self-signed certificate that you can use. However, web browsers generally issue a warning for self-signed certificates. StreamSets strongly recommends that you generate a key and certificate pair signed by a CA.
Important: Each signed certificate must include the fully qualified domain name (FQDN) for the Data Collector machine.
To obtain a certificate from a trusted CA, you must provide proof that you are the owner of the domain name for which you are requesting the certificate. Use OpenSSL or keytool to generate a key pair and then submit a Certificate Signing Request (CSR) to the CA. The exact procedure depends on the CA that you choose to use - see the documentation provided by the CA.

Step 1. Create Keystore Files

Create a keystore file that includes each private key and public certificate pair signed by the CA. A keystore is used to verify the identity of the client upon a request from an SSL/TLS server.

To enable HTTPS for Data Collector, create a single keystore file for Data Collector.

StreamSets recommends creating keystores in the PKCS #12 (p12 file) format. In most cases, a CA issues certificates in PEM format. Use OpenSSL to directly import the certificate into a PKCS #12 keystore.

  1. Use the following command to import the certificate and private key issued in PEM format to a PKCS #12 keystore for Data Collector:
    openssl pkcs12 -export -in <PEM certificate> -inkey <private key> -out <keystore filename> -name <keystore name> 

    You will be prompted to create a password for the keystore file.

    For example, the following command converts the certificate sdc_company_com.pem and private key sdc_company_com.key to the PKCS #12 keystore file named sdc_company_com.p12:
    openssl pkcs12 -export -in sdc_company_com.pem -inkey sdc_company_com.key -out sdc_company_com.p12 -name sdc_company_com
  2. Store the Data Collector keystore file in the Data Collector resources directory, $SDC_RESOURCES.
  3. Store the keystore password in a text file named keystore-password.txt in the Data Collector resources directory, $SDC_RESOURCES.
  4. Store each worker keystore file in the same absolute location on each worker node in the cluster.
    For example, if we generated worker keystore files named sdc_worker.p12, we'd store the files in the following directory on each worker node:
    /opt/security/sdc_worker.p12
  5. Store the worker keystore password in a password text file in the same absolute location on each worker node in the cluster.
    For example, we'd store the keystore-password.txt file in the following directory on each worker node:
    /opt/security/keystore-password.txt

Step 2. Create a Truststore File

A truststore file contains certificates from trusted CAs that an SSL/TLS client uses to verify the identity of an SSL/TLS server.

Data Collector requires a truststore file to verify the identity of the following SSL/TLS servers:
  • Secure LDAP server when Data Collector is configured for secure LDAP authentication.
  • Control Hub on-premises installation enabled for HTTPS when Data Collector is registered with Control Hub on-premises.

By default, Data Collector and worker nodes use the default Java truststore file located in $JAVA_HOME/jre/lib/security/cacerts. If your certificates are signed by a trusted CA that is included in the default Java truststore file, you do not need to create a truststore file for Data Collector or worker nodes and can skip this step.

If your certificates are signed by a private CA or not trusted by the default Java truststore, you must create a custom truststore file or modify a copy of the default Java truststore file to add the root and intermediate CA certificates to the Data Collector and worker node truststore file. For example, if your organization generates its own certificates, you must add the root and intermediate certificates for your organization to the truststore file.

You can create a single truststore file used by both Data Collector and worker nodes. Or you can create separate truststore files.

In these steps, we show how to modify a copy of the default truststore file to add an additional CA to the list of trusted CAs. We assume that the same CA signed our certificates used by Data Collector and by each worker node in the cluster. If multiple CAs signed your certificates, you'll need to add each CA to the truststore file.

If you prefer to create a custom truststore file, see the keytool documentation.

You can create the following types of truststores for Data Collector and worker nodes:
  • Java keystore file (JKS)
  • PKCS #12 (p12 file)
  1. Use the following command to set the JAVA_HOME environment variable:
    export JAVA_HOME=<Java home directory>
  2. Use the following command to set the SDC_CONF environment variable:
    export SDC_CONF=<Data Collector configuration directory>
    For example, for an RPM installation use:
    export SDC_CONF=/etc/sdc
  3. Use the following command to copy the default Java truststore file to the Data Collector configuration directory:
    cp "${JAVA_HOME}/jre/lib/security/cacerts" "${SDC_CONF}/truststore.jks"
  4. Use the following keytool command to import the CA certificate into the truststore file:
    keytool -import -file <CA certificate> -trustcacerts -noprompt -alias <CA alias> -storepass <password> -keystore "${SDC_CONF}/truststore.jks"
    For example:
    keytool -import -file  sdc_company_com.pem -trustcacerts -noprompt -alias MyCorporateCA -storepass changeit -keystore "${SDC_CONF}/truststore.jks"

Step 3. Configure Data Collector to Use HTTPS

Modify Data Collector configuration properties to configure Data Collector to use a secure port and your keystore file. If you created a custom truststore file or modified a copy of the default Java truststore file, configure Data Collector to use that truststore file.

  1. To define the secure port and keystore file, configure the following properties in the Data Collector configuration file, sdc.properties:
    Data Collector HTTPS Property Description
    sdc.base.http.url

    Data Collector URL. If the property is uncommented and defined, modify to use the HTTPS protocol and the secure port number, for example:

    sdc.base.http.url=https://myhost:18636

    If the property is commented, you do not need to define it.

    https.port Secure port number for Data Collector. For example, 18636.

    Any number besides -1 enables the secure port number.

    Note: When both the HTTP and HTTPS port properties are defined, the HTTP port bounces to the HTTPS port.
    https.keystore.path

    Path and name of the keystore file. Enter an absolute path or a path relative to the $SDC_RESOURCES directory.

    For example: https.keystore.path=sdc_company_com.p12

    Note: Default is keystore.jks in the $SDC_CONF directory which provides a self-signed certificate that you can use. However, StreamSets strongly recommends that you generate a certificate signed by a trusted CA, as described in Prerequisites.
    https.keystore.password Password to open the keystore file. To protect the password, store the password in an external location and then use a function to retrieve the password.
    For example, if you added the password to a text file named keystore-password.txt, configure the property as follows:
    https.keystore.password=${file("keystore-password.txt")}
    https.require.hsts Requires Data Collector to include the HTTP Strict Transport Security (HSTS) response header.

    Set to true to enable HSTS.

    Default is false.

  2. For an installation started as a service on operating systems that use the systemd init system, edit the sdc.socket file to use the same secure port that you just defined.
    The location of the sdc.socket file depends on how you installed Data Collector:
    • From the RPM package - /usr/lib/systemd/system/sdc.socket
    • From the tarball - /etc/systemd/system/sdc.socket
    For example, if you defined the Data Collector secure port number as 18636, modify these lines in the file as follows:
    [Socket]
    ListenStream=18636
    ListenStream=0.0.0.0:18636
  3. Use the following command to reload the systemd manager configuration:
    systemctl daemon-reload
  4. If you created a custom truststore file or modified a copy of the default Java truststore file for Data Collector to use, define the following options in the SDC_JAVA_OPTS environment variable:
    • javax.net.ssl.trustStore - Path to the truststore file on the Data Collector machine.
    • javax.net.ssl.trustStorePassword - Truststore password.

    Modify environment variables using the method required by your installation type.

    For example, define the options as follows:
    export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Djavax.net.ssl.trustStore=/etc/sdc/truststore.jks -Djavax.net.ssl.trustStorePassword=mypassword -Xmx1024m -Xms1024m -server -XX:-OmitStackTraceInFastThrow"

    Or to avoid saving the password in the export command, save the password in a text file and then define the truststore password option as follows: -Djavax.net.ssl.trustStorePassword=$(cat passwordfile.txt)

    Then ensure that the password file is readable only by the user executing the export command.

  5. Restart Data Collector to enable the changes.