Cluster Pipeline Limitations

Please note the following limitations in cluster pipelines:
  • Non-cluster origins - Do not use non-cluster origins in cluster pipelines. For a description of the origins to use, see Cluster Batch and Streaming Execution Modes.
  • Hive Metastore destination - This destination is not supported in cluster pipelines.
  • Kafka stages - Kafka stages do not support using the Provide Keytab and related properties to specify credentials for Kerberos authentication. Use JAAS files to provide Kerberos credentials.
  • MapReduce executor - This executor is not supported in cluster pipelines.
  • Pipeline events - You cannot use pipeline events in cluster pipelines.
  • Record Deduplicator processor - This processor is not supported in cluster pipelines at this time.
  • RabbitMQ Producer destination - This destination is not supported in cluster pipelines at this time.
  • Scripting processors - The state object is available only for the instance of the processor stage it is defined in. If the pipeline executes in cluster mode, the state object is not shared across nodes.
  • Spark executor - This executor is not supported in cluster pipelines.
  • Spark Evaluator processor - Use in cluster streaming pipelines only. Do not use in cluster batch pipelines. You can also use the Spark Evaluator in standalone pipelines.

    When using the Spark Evaluator processor, the processor must use the same Spark version as the cluster. For example, if the cluster uses Spark 2.1, the Spark Evaluator must use a Spark 2.1 stage library.

    The processor is available in Cloudera and MapR stage libraries. To verify the Spark version that a stage library includes, see the Cloudera or HPE Ezmeral Data Fabric documentation. For more information about the stage libraries that include the stage, see Available Stage Libraries in the Data Collector documentation.
  • Viewing Data Collector logs - When Data Collector is installed on a cluster node, you cannot view log data from the Data Collector UI. Instead, you must access the cluster logs.