Azure Data Lake Storage Gen2
Available when using an authoring Data Collector version 4.0.0 or later.
To create an Azure Data Lake Storage Gen2 connection, the Azure stage library,
streamsets-datacollector-azure-lib
, must be installed on the
selected authoring Data Collector.
For a description of the connection properties, see Azure Data Lake Storage Gen2 Connection Properties.
Engine | Stages |
---|---|
Data Collector 4.0.0 or later |
|
Prerequisites
- If necessary, create a new Azure Active
Directory application for Data Collector.
For information about creating a new application, see the Azure documentation.
- Ensure that the Azure Active Directory
Data Collector
application has the appropriate access control to perform the
necessary tasks.
The Data Collector application requires Read and Execute permissions to read data in Azure. If also writing to Azure, the application requires Write permission as well.
For information about configuring Gen2 access control, see the Azure documentation.
- Retrieve information from Azure to configure the connection.
After you complete all of the prerequisite tasks, you can configure a Azure Data Lake Storage Gen2 connection.
Retrieve Authentication Information
An Azure Data Lake Storage Gen2 connection can use different methods to authenticate with Azure.
- OAuth with Service Principal
- Connections made with OAuth with Service Principal authentication
require the following information:
- Application ID - Application ID for the Azure
Active Directory Data Collector application. Also known as the client ID.
For information on accessing the application ID from the Azure portal, see the Azure documentation.
- Tenant ID - Tenant
ID for the Azure Active Directory Data Collector application. Also known as the directory
ID.
For information on accessing the tenant ID from the Azure portal, see the Azure documentation.
- Application Key - Authentication key or
client secret for the Azure Active Directory
application. Also known as the client
secret.
For information on accessing the application key from the Azure portal, see the Azure documentation.
- Application ID - Application ID for the Azure
Active Directory Data Collector application. Also known as the client ID.
- Shared Key
- Connections made with Shared Key authentication require the
following information:
- Account Shared Key - Shared access key
that Azure generated for the storage
account.
For more information on accessing the shared access key from the Azure portal, see the Azure documentation.
- Account Shared Key - Shared access key
that Azure generated for the storage
account.
Azure Data Lake Storage Gen2 Connection Properties
Azure Property | Description |
---|---|
Account FQDN | The host name of the Data Lake Storage Gen2 account. For
example:
|
Storage Container / File System | Name of the storage container or file system that contains the data to be read or written. |
Secure Connection | Uses the abfss protocol to securely
connect to Azure using a TLS connection. When cleared,
the stage uses the |
Authentication Method |
Authentication method used to connect to Azure:
|
Application ID | Application ID for the Azure
Active Directory Data Collector application. Also known as the client ID.
For information on accessing the application ID from the Azure portal, see the Azure documentation. Available when using the OAuth with Service Principal authentication method. |
Tenant ID | Tenant
ID for the Azure Active Directory Data Collector application. Also known as the directory
ID. Available when using the OAuth with Service Principal authentication method. |
Application Key | Authentication key or
client secret for the Azure Active Directory
application. Also known as the client
secret. For information on accessing the application key from the Azure portal, see the Azure documentation. Available when using the OAuth with Service Principal authentication method. |
Account Shared Key | Shared access key
that Azure generated for the storage
account. For more information on accessing the shared access key from the Azure portal, see the Azure documentation. Available when using the Shared Key authentication method. |