Local Pipeline Prerequisites for Amazon S3 and ADLS
Transformer uses Hadoop APIs to connect to the following external systems:
- Amazon S3
- Microsoft Azure Data Lake Storage Gen1 and Gen2
To run pipelines that connect to these systems, Spark requires access to Hadoop client libraries and system-related client libraries.
Cluster pipelines require no action because supported cluster managers include the required libraries. However, before you run a local pipeline that connects to these systems, you must complete the following prerequisite tasks.