Enabling HTTPS
- Data Collector
- Enable HTTPS for Data Collector to secure the communication to the REST API and to use the Data Collector as an authoring Data Collector in Control Hub.
- Pipeline stages that connect to external systems
- During pipeline development, developers can enable specific stages to use SSL/TLS to secure the communication with an external system. For example, if designing a pipeline that writes to a Cassandra cluster enabled for HTTPS, the developer must configure the Cassandra destination to use SSL/TLS to connect to Cassandra.
By default, Data Collector uses the HTTP protocol. StreamSets recommends using HTTPS in a production environment. HTTPS requires SSL/TLS certificates.
Prerequisites
Before you enable HTTPS for Data Collector, complete the following requirements:
- Obtain access to OpenSSL and Java keytool
- If you do not have keystore files that include SSL/TLS certificates signed by a
certificate authority (CA), you can request certificates and create the keystore
files using the following tools:
- OpenSSL - Use OpenSSL to create a Certificate Signing Request (CSR) that you send to the CA of your choice, as well as to create the keystore files. For more information, see the OpenSSL documentation.
- Java keytool - You can also use Java keytool to create a CSR and to create keystore files. Java keytool is part of the Java Development Kit (JDK). For more information, see the keytool documentation.
- Generate SSL/TLS certificate and private key pairs signed by a certificate authority (CA)
- To enable HTTPS for Data Collector, generate a single private key and public certificate pair for Data Collector. Data Collector provides a self-signed certificate that you can use. However, web browsers generally issue a warning for self-signed certificates. StreamSets strongly recommends that you generate a key and certificate pair signed by a CA.
Step 1. Create Keystore Files
Create a keystore file that includes each private key and public certificate pair signed by the CA. A keystore is used to verify the identity of the client upon a request from an SSL/TLS server.
To enable HTTPS for Data Collector, create a single keystore file for Data Collector.
StreamSets recommends creating keystores in the PKCS #12 (p12 file) format. In most cases, a CA issues certificates in PEM format. Use OpenSSL to directly import the certificate into a PKCS #12 keystore.
-
Use the following command to import the certificate and private key issued in
PEM format to a PKCS #12 keystore for Data Collector:
openssl pkcs12 -export -in <PEM certificate> -inkey <private key> -out <keystore filename> -name <keystore name>
You will be prompted to create a password for the keystore file.
For example, the following command converts the certificate sdc_company_com.pem and private key sdc_company_com.key to the PKCS #12 keystore file named sdc_company_com.p12:openssl pkcs12 -export -in sdc_company_com.pem -inkey sdc_company_com.key -out sdc_company_com.p12 -name sdc_company_com
-
Store the Data Collector keystore file in the Data Collector resources directory,
$SDC_RESOURCES
. -
Store the keystore password in a text file named
keystore-password.txt
in the Data Collector resources directory,$SDC_RESOURCES
. -
Store each worker keystore file in the same absolute location on each worker
node in the cluster.
For example, if we generated worker keystore files named sdc_worker.p12, we'd store the files in the following directory on each worker node:
/opt/security/sdc_worker.p12
-
Store the worker keystore password in a password text file in the same absolute
location on each worker node in the cluster.
For example, we'd store the keystore-password.txt file in the following directory on each worker node:
/opt/security/keystore-password.txt
Step 2. Create a Truststore File
A truststore file contains certificates from trusted CAs that an SSL/TLS client uses to verify the identity of an SSL/TLS server.
- Secure LDAP server when Data Collector is configured for secure LDAP authentication.
- Control Hub on-premises installation enabled for HTTPS when Data Collector is registered with Control Hub on-premises.
By default, Data Collector and worker nodes use the default Java truststore file located in $JAVA_HOME/jre/lib/security/cacerts. If your certificates are signed by a trusted CA that is included in the default Java truststore file, you do not need to create a truststore file for Data Collector or worker nodes and can skip this step.
If your certificates are signed by a private CA or not trusted by the default Java truststore, you must create a custom truststore file or modify a copy of the default Java truststore file to add the root and intermediate CA certificates to the Data Collector and worker node truststore file. For example, if your organization generates its own certificates, you must add the root and intermediate certificates for your organization to the truststore file.
You can create a single truststore file used by both Data Collector and worker nodes. Or you can create separate truststore files.
In these steps, we show how to modify a copy of the default truststore file to add an additional CA to the list of trusted CAs. We assume that the same CA signed our certificates used by Data Collector and by each worker node in the cluster. If multiple CAs signed your certificates, you'll need to add each CA to the truststore file.
If you prefer to create a custom truststore file, see the keytool documentation.
- Java keystore file (JKS)
- PKCS #12 (p12 file)
-
Use the following command to set the JAVA_HOME environment variable:
export JAVA_HOME=<Java home directory>
-
Use the following command to set the SDC_CONF environment variable:
export SDC_CONF=<Data Collector configuration directory>
For example, for an RPM installation use:export SDC_CONF=/etc/sdc
-
Use the following command to copy the default Java truststore file to the Data Collector configuration directory:
cp "${JAVA_HOME}/jre/lib/security/cacerts" "${SDC_CONF}/truststore.jks"
-
Use the following keytool command to import the CA certificate into the
truststore file:
keytool -import -file <CA certificate> -trustcacerts -noprompt -alias <CA alias> -storepass <password> -keystore "${SDC_CONF}/truststore.jks"
For example:keytool -import -file sdc_company_com.pem -trustcacerts -noprompt -alias MyCorporateCA -storepass changeit -keystore "${SDC_CONF}/truststore.jks"
Step 3. Configure Data Collector to Use HTTPS
Modify Data Collector configuration properties to configure Data Collector to use a secure port and your keystore file. If you created a custom truststore file or modified a copy of the default Java truststore file, configure Data Collector to use that truststore file.
-
To define the secure port and keystore file, configure the following properties
in the Data Collector configuration file, sdc.properties:
Data Collector HTTPS Property Description sdc.base.http.url Data Collector URL. If the property is uncommented and defined, modify to use the HTTPS protocol and the secure port number, for example:
sdc.base.http.url=https://myhost:18636
If the property is commented, you do not need to define it.
https.port Secure port number for Data Collector. For example, 18636. Any number besides -1 enables the secure port number.
Note: When both the HTTP and HTTPS port properties are defined, the HTTP port bounces to the HTTPS port.https.keystore.path Path and name of the keystore file. Enter an absolute path or a path relative to the $SDC_RESOURCES directory.
For example:
https.keystore.path=sdc_company_com.p12
Note: Default iskeystore.jks
in the$SDC_CONF
directory which provides a self-signed certificate that you can use. However, StreamSets strongly recommends that you generate a certificate signed by a trusted CA, as described in Prerequisites.https.keystore.password Password to open the keystore file. To protect the password, store the password in an external location and then use a function to retrieve the password. For example, if you added the password to a text file named keystore-password.txt, configure the property as follows:https.keystore.password=${file("keystore-password.txt")}
https.require.hsts Requires Data Collector to include the HTTP Strict Transport Security (HSTS) response header. Set to
true
to enable HSTS.Default is
false
. -
For an installation started as a service on operating systems that use the
systemd init system, edit the sdc.socket file to use the
same secure port that you just defined.
The location of the
sdc.socket
file depends on how you installed Data Collector:- From the RPM package -
/usr/lib/systemd/system/sdc.socket
- From the tarball -
/etc/systemd/system/sdc.socket
For example, if you defined the Data Collector secure port number as 18636, modify these lines in the file as follows:[Socket] ListenStream=18636 ListenStream=0.0.0.0:18636
- From the RPM package -
-
Use the following command to reload the systemd manager configuration:
systemctl daemon-reload
-
If you created a custom truststore file or modified a copy of the default Java
truststore file for Data Collector to use, define the following options in the SDC_JAVA_OPTS environment
variable:
- javax.net.ssl.trustStore - Path to the truststore file on the Data Collector machine.
- javax.net.ssl.trustStorePassword - Truststore password.
Modify environment variables using the method required by your installation type.
For example, define the options as follows:export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Djavax.net.ssl.trustStore=/etc/sdc/truststore.jks -Djavax.net.ssl.trustStorePassword=mypassword -Xmx1024m -Xms1024m -server -XX:-OmitStackTraceInFastThrow"
Or to avoid saving the password in the export command, save the password in a text file and then define the truststore password option as follows: -Djavax.net.ssl.trustStorePassword=$(cat passwordfile.txt)
Then ensure that the password file is readable only by the user executing the export command.
- Restart Data Collector to enable the changes.