Azure Data Lake Storage Gen2
The Azure Data Lake Storage Gen2 origin reads data from Microsoft Azure Data Lake Storage Gen2. The origin can create multiple threads to enable parallel processing in a multithreaded . Use the origin only in configured for standalone execution mode. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.
The origin uses the Microsoft Azure Data Lake Storage Gen2 API to request a list of objects located in a storage container or file system and that match a pattern in a directory. As Azure returns pages with the requested objects, the origin launches threads to read and process the data. The objects must be fully written.
After processing an object or upon encountering errors, the origin can keep, archive, or delete the object. When archiving, the origin can move the object.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
When you configure the Azure Data Lake Storage Gen2 origin, you specify connection information for Azure Data Lake Storage Gen2, including the storage container or file system and authentication method. You can also use a connectionconnectionconnection to configure the origin. You also specify the number of objects for Azure to list on a page and the maximum time to wait for Azure to return the requested objects.
You specify information about the objects to read and how to process, including the directory that contains the objects, the matching name pattern, the order to read the objects, the number of threads used to process the data, and the batch size.
You also specify what to do with processed objects and how to handle objects that cannot be processed.
When a stops, the Azure Data Lake Storage Gen2 origin notes where it stops reading. When the starts again, the origin continues reading from where it stopped by default. You can reset the originreset the originreset the origin to read and process all requested objects.