Enabling HTTPS

By default, the Control Hub web browser uses WebSocket tunneling to communicate with deployed Data Collectors. WebSocket tunneling ensures that your data is secure and does not require additional setup.

However, when you preview a pipeline or capture a snapshot of an active job, your source data does pass through encrypted connections beyond your corporate network into Control Hub, and then back to your web browser. If your data must remain behind a firewall due to corporate regulations, you can configure the browser to use direct engine REST APIs to directly communicate with the engines behind the firewall. For more information, see Engine Communication in the Control Hub documentation.

When using direct engine REST APIs, you must enable Data Collector to use the HTTPS protocol.

Note: During pipeline development, developers can enable specific stages to use SSL/TLS to secure the communication with an external system. For example, if designing a pipeline that writes to a Cassandra cluster enabled for HTTPS, the developer must configure the Cassandra destination to use SSL/TLS to connect to Cassandra. Enabling SSL/TLS in stages does not require enabling Data Collector to use HTTPS. For information about enabling HTTPS for pipeline stages, see SSL/TLS Encryption.

Prerequisites

Before you enable HTTPS for Data Collector, complete the following requirements:

Obtain access to OpenSSL and Java keytool
If you do not have keystore files that include SSL/TLS certificates signed by a certificate authority (CA), you can request certificates and create the keystore files using the following tools:
  • OpenSSL - Use OpenSSL to create a Certificate Signing Request (CSR) that you send to the CA of your choice, as well as to create the keystore files. For more information, see the OpenSSL documentation.
  • Java keytool - You can also use Java keytool to create a CSR and to create keystore files. Java keytool is part of the Java Development Kit (JDK). For more information, see the keytool documentation.
Generate SSL/TLS certificate and private key pairs signed by a certificate authority (CA)
To enable HTTPS for Data Collector, generate a single private key and public certificate pair for Data Collector. Data Collector provides a self-signed certificate that you can use. However, web browsers generally issue a warning for self-signed certificates. StreamSets strongly recommends that you generate a key and certificate pair signed by a CA.
Important: Each signed certificate must include the fully qualified domain name (FQDN) for the Data Collector machine
To obtain a certificate from a trusted CA, you must provide proof that you are the owner of the domain name for which you are requesting the certificate. Use OpenSSL or keytool to generate a key pair and then submit a Certificate Signing Request (CSR) to the CA. The exact procedure depends on the CA that you choose to use - see the documentation provided by the CA.

Step 1. Create a Keystore File

Create a keystore file that includes each private key and public certificate pair signed by the CA. A keystore is used to verify the identity of the client upon a request from an SSL/TLS server.

StreamSets recommends creating keystores in the PKCS #12 (p12 file) format. In most cases, a CA issues certificates in PEM format. Use OpenSSL to directly import the certificate into a PKCS #12 keystore.

  1. Use the following command to import the certificate and private key issued in PEM format to a PKCS #12 keystore for Data Collector:
    openssl pkcs12 -export -in <PEM certificate> -inkey <private key> -out <keystore filename> -name <keystore name> 

    You will be prompted to create a password for the keystore file.

    For example, the following command converts the certificate sdc_company_com.pem and private key sdc_company_com.key to the PKCS #12 keystore file named sdc_company_com.p12:
    openssl pkcs12 -export -in sdc_company_com.pem -inkey sdc_company_com.key -out sdc_company_com.p12 -name sdc_company_com
  2. Store the keystore password in a password text file named keystore-password.txt.
  3. Store the keystore and password text files in the Data Collector resources directory, <installation_dir>/externalResources/resources, on each Data Collector machine.

    For example, streamsets-datacollector-4.2.0/externalResources/resources.

Step 2. Configure Data Collector to Use HTTPS

Modify Data Collector configuration properties to configure Data Collector to use a secure port and your keystore file.

  1. When using one of the cloud service provider deployments, such as an Amazon EC2 or a Google Compute Engine (GCE) deployment, locate the public IP address of the provisioned instance.
    1. Launch the deployment to provision the instance.
    2. Use the console for your cloud service provider to locate the provisioned instance.
    3. Copy the public IP address of the instance.
  2. In Control Hub, edit the deployment. In the Configure Engine section, click Advanced Configuration. Then, click Data Collector Configuration.
  3. To define the secure port and keystore file, define the following Data Collector configuration properties:
    Data Collector HTTPS Property Description
    https.port Secure port number for Data Collector. For example, 18636.

    Any number besides -1 enables the secure port number.

    Note: When both the HTTP and HTTPS port properties are defined, the HTTP port bounces to the HTTPS port.
    sdc.base.http.url

    Data Collector URL using the HTTPS protocol and the secure port number configured in the https.port property.

    When using a cloud service provider deployment, use the public IP address that you copied from the cloud service provider console. For example:

    sdc.base.http.url=https://<IP address>:18636

    When using a self-managed deployment and Data Collector runs on a local on-premises machine, you might use the name of the host machine. For example:

    sdc.base.http.url=https://myhost:18636

    Important: When using a self-managed deployment and Data Collector runs on a cloud computing machine, use the public IP address of that instance.

    Be sure to uncomment the property.

    https.keystore.path

    Path and name of the keystore file. Enter an absolute path or a path relative to the Data Collector resources directory.

    For example: https.keystore.path=sdc_company_com.p12

    Note: Default is keystore.jks which provides a self-signed certificate that you can use. However, StreamSets strongly recommends that you generate a certificate signed by a trusted CA, as described in Prerequisites.
    https.keystore.password Password to open the keystore file. To protect the password, store the password in an external location and then use a function to retrieve the password.
    For example, if you added the password to a text file named keystore-password.txt, configure the property as follows:
    https.keystore.password=${file("keystore-password.txt")}
    https.require.hsts Requires Data Collector to include the HTTP Strict Transport Security (HSTS) response header.

    Set to true when Data Collector uses HTTPS to enable HSTS.

    Default is false.

  4. Save the changes to the deployment and restart all engine instances.