Upgrade an Installation with Cloudera Manager

When you upgrade an installation with Cloudera Manager, the new version uses the same configuration, data, log, and resource directories. As a result, the new version has access to the files created in the previous version.

Note: If you installed external libraries or developed custom stages, verify that those libraries are stored in a local directory external to the Data Collector runtime directory before you upgrade. That way, Data Collector can still use the libraries after the upgrade.
To upgrade Data Collector through Cloudera Manager, perform the following steps:

Step 1. Stop All Pipelines

Step 2. Back Up the Previous Version

Step 3. Install the StreamSets Custom Service Descriptor

Step 4. Manually Install the Parcel and Checksum Files (Optional)

Step 5. Distribute and Activate the New StreamSets Parcel

Step 6. Verify Modified Safety Valves

Step 7. Restart the StreamSets Service

Warning: You must perform the steps in this order, or Data Collector will fail to start.

Step 1. Stop All Pipelines

Stop all pipelines running on the Data Collector to be upgraded.

Use one of the following methods to stop all pipelines:

  • If the Data Collector is not registered to work with StreamSets Control Hub, stop the pipelines using the Data Collector UI.

    From the Data Collector Home page, select all running pipelines in the list and then click the Stop icon.

  • If the Data Collector is registered to work with StreamSets Control Hub, stop all jobs running on the Data Collector using the Control Hub UI.

    From the Control Hub Jobs page, filter the jobs by engine and by engine label. Select all active jobs in the list and then click the Stop Jobs icon.

Step 2. Back Up the Previous Version

Before you install the new version, create a backup of the files in the previous version by copying and renaming the configuration, data, and resource directories. That way, you can continue to run the previous version if needed.

Copy and rename the following directories on every Cloudera Manager node that runs Data Collector:

  • SDC_DATA - The Data Collector directory for pipeline state and configuration information.
  • SDC_RESOURCES - The Data Collector directory for runtime resource files.
If used, copy and rename the following directories as well:
  • SDC_EXTERNAL_RESOURCES - The Data Collector directory for external resources.
  • SDC_RESOURCES - The Data Collector directory for runtime resource files.
  • STREAMSETS_LIBRARIES_EXTRA_DIR - The Data Collector directory for external libraries.

  • USER_LIBRARIES_DIR - The Data Collector directory for custom stages.

For example, if you are upgrading version 3.0.0.0, copy the Data Collector configuration directory and rename it as follows: /etc/sdc3000.

If you need to roll back to the previous version, you must restore the previous directories on every Cloudera Manager node that runs Data Collector.
Important: These directories should be outside of the $SDC_DIST runtime directory.

For more information about these directories, including the default values, see Data Collector Directories.

Step 3. Install the StreamSets Custom Service Descriptor

Install the new StreamSets custom service descriptor file (CSD), and then restart Cloudera Manager.

  1. Download the CSD from one of the following locations:
    Or, you can use the GNU Wget program to download the CSD from the command line by running the following commands:
    export VERSION="5.10.0"
    wget https://archives.streamsets.com/datacollector/$VERSION/csd/STREAMSETS-$VERSION.jar
  2. Remove the previous StreamSets CSD file from Cloudera Manager.
    For example:
    rm -f /opt/cloudera/csd/STREAMSETS*.jar
  3. Copy the Data Collector CSD file to the Local Descriptor Repository Path. By default, the path is /opt/cloudera/csd.
    To verify the path to use, in Cloudera Manager, click Administration > Settings. In the navigation panel, select the Custom Service Descriptors category. Place the CSD file in the path configured for Local Descriptor Repository Path.
  4. Set the file ownership to cloudera-scm:cloudera-scm with permission 644.
    For example:
    chown cloudera-scm:cloudera-scm /opt/cloudera/csd/STREAMSETS*.jar
    chmod 644 /opt/cloudera/csd/STREAMSETS*.jar
  5. Use one of the following commands to restart Cloudera Manager Server:
    For Ubuntu 14.04, CentOS 6, Red Hat Enterprise Linux 6, or Oracle Linux 6:
    service cloudera-scm-server restart
    For Ubuntu 16.04, CentOS 7, Red Hat Enterprise Linux 7, or Oracle Linux 7:
    systemctl restart cloudera-scm-server
  6. In Cloudera Manager, to restart the Cloudera Management Service, click Home > Status. To the right of Cloudera Management Service, click the Menu icon and select Restart.

Step 4. Manually Install the Parcel and Checksum Files (Optional)

You can manually install the StreamSets parcel and related checksum files. Manually install the files when the Cloudera Manager Server does not have internet access.

When working with multiple clusters, perform the following steps for each cluster.

  1. Download the StreamSets parcel and related checksum file for the Cloudera Manager Server operating system.
  2. Copy the StreamSets parcel and checksum file to the Cloudera Manager Local Parcel Repository Path.
    By default, the path is /opt/cloudera/parcel-repo.
    To verify the path to use, click Administration > Settings. In the navigation panel, select the Parcels category. Place the StreamSets parcel file in the path configured for Local Parcel Repository Path.
  3. Change ownership on the parcel and checksum file to the user that runs the Cloudera Manager process.
    For example, if the Cloudera Manager process runs as the cloudera-scm user, use the following command to change ownership to cloudera-scm:
    sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/STREAMSETS_DATACOLLECTOR*

Step 5. Distribute and Activate the New StreamSets Parcel

After you add the StreamSets repository to Cloudera Manager, you can download and distribute the new StreamSets parcel across the cluster. Stop the StreamSets service and deactivate the previous parcel before you activate the new parcel.

  1. To view the list of available parcels, in the menu bar, click the Parcels icon.

    The new StreamSets parcel displays in the list of available parcels. If it doesn't display, click Check for New Parcels.

  2. To download the new StreamSets parcel to the local repository, click Download.

    After the parcel is downloaded, the Download button becomes the Distribute button.

  3. To distribute the new StreamSets parcel to the cluster, click Distribute.
  4. To stop the StreamSets service, click Clusters > StreamSets and then click Actions > Stop.
  5. Click the Parcels icon to return to the Parcels page.
  6. To deactivate the previous StreamSets parcel, choose the appropriate cluster in the Location selector, and then click Deactivate for the parcel.
  7. To activate the new StreamSets parcel, choose the appropriate cluster in the Location selector, and then click Activate for the parcel.

Step 6. Verify Modified Safety Valves

When you upgrade, Cloudera Manager updates the Data Collector configuration properties for you. However, if you modified any of the Advanced Configuration Snippet (Safety Valve) properties in Cloudera Manager for the previous Data Collector version, those values override any property settings in the new configuration files.

You must compare the new configuration files shipped with the parcel in /opt/cloudera/parcels/STREAMSETS with your modified safety valves and update the safety valves as needed to include any new properties.

For example, if you used the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties to override the system.stagelibs.blacklist property, you must add any new stage libraries listed in the blacklist property in the new sdc.properties file to the overridden property in the safety valve.

Step 7. Restart the StreamSets Service

When you restart the StreamSets service, Cloudera Manager updates the Data Collector configuration properties for you. Cloudera Manager retains any customized values that you added in the previous Data Collector version. It also adds any new properties included in the new Data Collector version.

To restart the StreamSets service, click Clusters > StreamSets and then click Actions > Start.