Local Pipeline Prerequisites for Amazon S3 and ADLS

Transformer uses Hadoop APIs to connect to the following external systems:
  • Amazon S3
  • Microsoft Azure Data Lake Storage Gen1 and Gen2

To run pipelines that connect to these systems, Spark requires access to Hadoop client libraries and system-related client libraries.

Cluster pipelines require no action because supported cluster managers include the required libraries. However, before you run a local pipeline that connects to these systems, you must complete the following prerequisite tasks.