Managing Cluster Version Changes

When you migrate your workload to a newer cluster version, you must make corresponding Transformer updates.

A cluster version is associated with specific Spark and Scala versions. And the Scala version of a cluster must match the Scala version that Transformer is prebuilt with. So when you migrate from a cluster that uses an earlier Scala version to a cluster that uses a later one, you must replace your Transformer version with a Transformer that is prebuilt with the later Scala version. This change also requires additional Transformer updates described below.

For example, say you use Transformer prebuilt with Scala 2.11 to run workloads in an EMR 5.36 cluster, which uses Scala 2.11. After you migrate to EMR 7.0, which uses Scala 2.12, you upgrade to Transformer prebuilt with Scala 2.12 and complete the additional steps below.

Use the following steps to enable Transformer to run existing pipelines after upgrading your cluster:
  1. If the new cluster uses a different Scala version, install a Transformer engine that is compatible with the Spark and Scala version on the new cluster, then register it with Control Hub.
  2. Edit pipelines:
    1. Update each pipeline to use the new Transformer as the authoring engine.
    2. On the Cluster tab, enter the cluster information for the new cluster.
    3. If related stages use Transformer-provided libraries, update the stage library to use an appropriate version.

      If stages use cluster-provided stage libraries, you can skip this step.

  3. Update the execution engine labels in associated jobs so they run on the new Transformer.

For example, say you upgrade an Amazon EMR cluster and the new version uses Spark 3.x and Scala 2.12. Your existing Transformer engine processed data from the earlier EMR version with a different Spark and Scala version. So, after the upgrade you need to install a Transformer prebuilt with Scala 2.12, which is compatible with Spark 3.x.

Then, you edit your pipelines to use the new authoring Transformer engine and update the cluster details so pipelines point to the new cluster. If you have Amazon stages that use Transformer-provided libraries, you also update the Stage Library configuration to match the new Amazon EMR version.