Azure Data Lake Storage Gen2

Data Collector

The Azure Data Lake Storage Gen2 destination writes data to Microsoft Azure Data Lake Storage Gen2. You can use the Azure Data Lake Storage Gen2 destination in standalone and cluster batch pipelines. To write to Azure Data Lake Storage Gen1, use the Azure Data Lake Storage Gen1 destination. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

Before you use the destination, you must perform some prerequisite tasks.

When you configure the Azure Data Lake Storage Gen2 destination, you specify the authentication method to use and related properties. You define a directory template and time basis to determine the output directories that the destination creates and the files where records are written.

You can define a file prefix and suffix, the data time zone, and properties that define when the destination closes a file. You can specify the amount of time that a record can be written to its associated directory and what happens to late records.

When desired, you can write records, use the defined Avro schema, and roll files based on record header attributes. For more information, see Record Header Attributes for Record-Based Writes.

You can use Gzip, Bzip2, Snappy, LZ4, and other compression formats to write output files.

You can also use a connection connection connection to configure the destination.

The destination can generate events for an event stream. For more information about the event framework, see Dataflow Triggers Overview.