Azure Data Lake Storage Gen2
Available when using an authoring Data Collector version 4.0.0 or later.
To create an Azure Data Lake Storage Gen2 connection, the Azure stage library,
streamsets-datacollector-azure-lib
, must be installed on the
selected authoring Data Collector.
For a description of the connection properties, see Azure Data Lake Storage Gen2 Connection Properties.
Engine | Stages |
---|---|
Data Collector 5.5.0 or later |
|
Data Collector 4.0.0 or later |
|
For information about features added to the connection with different engine releases, see the connection requirements for the engine.
Prerequisites
- If necessary, create a new Azure Active Directory
application for Data Collector.
For information about creating a new application, see the Azure documentation.
- Ensure that the Azure Active Directory Data Collector application
has the appropriate access control to perform the necessary
tasks.
The Data Collector application requires Read and Execute permissions to read data in Azure. If also writing to Azure, the application requires Write permission as well.
For information about configuring Gen2 access control, see the Azure documentation.
- Retrieve information from Azure to configure the connection.
After you complete all of the prerequisite tasks, you can configure a Azure Data Lake Storage Gen2 connection.
Retrieve Authentication Information
An Azure Data Lake Storage Gen2 connection can use different methods to authenticate with Azure.
- OAuth with Service Principal
- Connections made with OAuth with Service Principal authentication require
the following information:
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
For information on accessing the application ID from the Azure portal, see the Azure documentation.
- Tenant ID - Tenant ID for the Azure Active Directory
Data Collector application. Also known as the directory ID.
For information on accessing the tenant ID from the Azure portal, see the Azure documentation.
- Application Key - Authentication key or client secret
for the Azure Active Directory application. Also known as the
client secret.
For information on accessing the application key from the Azure portal, see the Azure documentation.
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
- Azure Managed Identity
- Connections made with Azure Managed Identity authentication
require the following information:
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
For information on accessing the application ID from the Azure portal, see the Azure documentation.
- Application ID - Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID.
- Shared Key
- Connections made with Shared Key authentication require the following
information:
Azure Data Lake Storage Gen2 Connection Properties
Azure Property | Description |
---|---|
Account FQDN | The host name of the Data Lake Storage Gen2 account. For example:
|
Storage Container / File System | Name of the storage container or file system that contains the data to be read or written. |
Secure Connection | Uses the abfss protocol to securely connect to Azure
using a TLS connection. When cleared, the stage uses the
|
Authentication Method | Authentication method used to connect to Azure:
|
Application ID | Application ID for the Azure Active Directory Data Collector
application. Also known as the client ID. For information on accessing the application ID from the Azure portal, see the Azure documentation. Available when using the OAuth with Service Principal or the Azure Managed Identity authentication method. |
Endpoint Type | Method to provide endpoint details. Available when using the OAuth with Service Principal authentication method. |
Tenant ID | Tenant ID for the Azure Active Directory
Data Collector application. Also known as the directory ID. For information on accessing the tenant ID from the Azure portal, see the Azure documentation. Available when Endpoint Type is set to Tenant ID. |
Endpoint URL | Endpoint URL for the Azure Active Directory Data Collector
application. Default is
In the URL, specify the tenant ID for the Azure Active Directory Data Collector application. For information on accessing the tenant ID from the Azure portal, see the Azure documentation. Available when Endpoint Type is set to Endpoint URL. |
Application Key | Authentication key or client secret
for the Azure Active Directory application. Also known as the
client secret. For information on accessing the application key from the Azure portal, see the Azure documentation. Available when using the OAuth with Service Principal authentication method. |
Account Shared Key | Shared access key that Azure
generated for the storage account. For more information on accessing the shared access key from the Azure portal, see the Azure documentation. Available when using the Shared Key authentication method. |