Pipelines
Use the
following tips for help with pipeline errors:
- A pipeline fails to start with the following error:
-
org.apache.spark.SparkException: Dynamic allocation of executors requires the external shuffle service. You may enable this through spark.shuffle.service.enabled.
- A pipeline fails to start with the following error:
This error occurs when you try to run a pipeline on an existing Databricks cluster that has previously run pipelines built on a different version of Transformer. This is not allowed.TRANSFORMER_03 Databricks shared cluster <name> already contains StreamSets libraries from a different staging directory <directory>. Either replace 'Pipeline Config > Staging Directory' config value to <directory> or uninstall all libraries from the shared cluster and restart the cluster.
- A pipeline preview, validation, or run fails with the following error:
This error occurs when Spark cannot communicate with Transformer using the properties configured in the Transformer configuration properties.TRANSFORMER_02 Failed to <preview, validate, or run> pipeline, check logs for error. The Transformer Spark application in the Spark cluster might not be able to reach Transformer at <URL>. If this is not the correct URL, update the transformer.base.http.url property in the Transformer configuration file or define a cluster callback URL for the pipeline and restart Transformer.
- A pipeline fails with the following run error:
-
org.apache.spark.sql.AnalysisException: Found duplicate column(s)
- Pipeline validation fails with the following stage library/cluster manager mismatch error:
-
VALIDATION_0300 Stage <stage name> using the <stage library name> stage library cannot be used with the <cluster type> cluster manager type
- Pipelines running on a Hadoop YARN cluster indefinitely remain in a running or stopping status.
-
When you run pipelines on a Hadoop YARN cluster, the Spark submit process continues to run until the pipeline finishes, which uses memory on the Transformer machine. This memory usage can cause pipelines to indefinitely remain in a running or stopping status when the Transformer machine has limited memory or when a large number of pipelines start on a single Transformer.
To avoid this issue, run the following command on each Transformer machine to decrease the amount of memory available to the Spark submit process:export SPARK_SUBMIT_OPTS="-Xmx64m"