Credential Stores

Data Collector pipeline stages communicate with external systems to read and write data. Many of these external systems require sensitive information, such as user names or passwords, to access the data. When you configure pipeline stages for these external systems, you must specify the details that the stages need to connect to the system.

If you enter sensitive information directly in stage properties, you expose those details to any user with access to the pipeline. To access external systems without exposing the sensitive information, add them as secrets in a credential store and then use Data Collector credential functions in the stage properties to retrieve those values.

Defining secrets in a credential store can make it easier to migrate pipelines to another environment. For example, if you migrate multiple pipelines from a development to a production environment, you do not need to edit each pipeline with details for the production environment. You can simply replace the development credential store with the production version.

You can configure Data Collector to use multiple credential stores at the same time. Each credential store is identified by a unique credential store ID.

You can use the following credential stores with Data Collector:
Important: Use the Java keystore credential store system in a development environment only. In a production environment, use a centralized keystore such as the other supported credential stores, to better secure sensitive information.

Enabling Credential Stores

You can configure Data Collector to use one or more credential stores. Each credential store is identified by a unique credential store ID.

You specify the credential stores that Data Collector can use in the $SDC_CONF/credential-stores.properties file. The file includes the following information:
credentialStores property
This property defines the credential stores that Data Collector can use.
By default, the property is commented out and includes a default credential store ID for each of the supported credential store types, such as aws for AWS Secrets Manager and azure for Azure Key Vault.
To enable using credential stores, you uncomment this property and enter a comma-separated list of the credential store IDs to use.
You can specify multiple credential stores of the same type or of different types, such as two Hashicorp Vaults and one Java keystore. You simply specify a unique ID for each credential store.
usePortableGroups property
This property allows you to migrate pipelines that access a credential store from one Control Hub organization to another without updating the pipeline.
Important: Use this property only when recommended by the StreamSets Support team.
To call secrets from a pipeline, you use credential functions and include a group argument in the expression. When working with Control Hub, you define the group argument as follows: <group ID>@<organization ID>. When the usePortableGroups property is enabled, Data Collector does not evaluate the <organization ID> portion of the group argument. This allows you to migrate pipelines from one organization to another without editing credential functions in pipelines, as long as the new organization has matching group names with the same credential store access.
For example, when the usePortableGroups property is enabled, the group argument dev@mycompany in a credential function is read as dev. So if you migrate the pipeline to a different organization that also has a dev group with the same credential store access, the pipeline can be used without updates.
By default, the property is commented out and set to false. When recommended by the StreamSets Support team, you can enable the property by uncommenting the property and setting it to true.
Sets of related properties
Each supported credential store type has a set of related properties. The property names include the default credential store IDs originally specified in the credentialStores property.
For example, the CyberArk properties include cyberark, the default CyberArk ID, in each CyberArk property name, such as credentialStore.cyberark.config.region and credentialStore.cyberark.config.access.key.
When you use a custom credential store ID, you must update all related property names to match the custom ID. For example, if you want to use cyberarkUS as a custom ID, you must update all CyberArk default property names for the cyberarkUS credential store replacing cyberark with cyberarkUS.
Note: When you want to use multiple credential stores of the same type, you must have a set of related store properties that are renamed and defined appropriately for each credential store.

For example, say you want to use two Azure credential stores, azureDev for development and azureProd for production. To do this, you specify the credential store IDs in the credentialStores property and make a copy of the related Azure credential store properties, so you have one set for each credential store.

Then, you rename and configure the properties for azureDev, and you do the same for azureProd. The resulting properties might look as follows, with important changes highlighted:
################################################
#      Data Collector Credential Stores        #
################################################

credentialStores=azureDev,azureProd

#credentialStores.usePortableGroups=false

############################################################
# azureDev: Azure Key Vault Credential Store Configuration #
############################################################

credentialStore.azureDev.def=streamsets-datacollector-azure-keyvault-credentialstore-lib::com_streamsets_datacollector_credential_azure_keyvault_AzureKeyVaultCredentialStore
credentialStore.azureDev.config.credential.refresh.millis=30000
credentialStore.azureDev.config.credential.retry.millis=15000
credentialStore.azureDev.config.vault.url=https://development.vault.azure.net/
credentialStore.azureDev.config.client.id=devClientID
credentialStore.azureDev.config.client.key=devClientKey
credentialStore.azureDev.config.enforceEntryGroup=false

#############################################################
# azureProd: Azure Key Vault Credential Store Configuration #
#############################################################

credentialStore.azureProd.def=streamsets-datacollector-azure-keyvault-credentialstore-lib::com_streamsets_datacollector_credential_azure_keyvault_AzureKeyVaultCredentialStore
credentialStore.azureProd.config.credential.refresh.millis=30000
credentialStore.azureProd.config.credential.retry.millis=15000
credentialStore.azureProd.config.vault.url=https://production.vault.azure.net/
credentialStore.azureProd.config.client.id=prodClientID
credentialStore.azureProd.config.client.key=prodClientKey
credentialStore.azureProd.config.enforceEntryGroup=false

Group Access to Secrets

As an additional layer of security, you can employ user groups to further limit access to the secrets defined in credential stores.

Data Collector provides two methods to limit access with user groups:
Required group argument in credential functions
Credential functions include a group argument that defines the user group that can access the secret. The group argument ensures that the user who attempts to preview, validate, or start a pipeline that includes a credential function belongs to the group specified in the function. The user must also have execute permission on the pipeline.
When working only with Data Collector, simply specify the group name, such as devops. When working with Control Hub, specify the group argument using the following naming convention: <group ID>@<organization ID>. For example, devops@MyCompany.
If you do not want to restrict access to a secret, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. If you use the all group, you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.

If Data Collector shuts down while running a pipeline that uses a credential function, Data Collector restarts the pipeline without checking the group access.

Optional group secrets in the credential store

In addition to using the group argument in credential functions, you can configure Data Collector to require group secrets for a credential store.

To require the use of group secrets, in the $SDC_CONF/credential-stores.properties file, set the credentialStore.<cstore ID>.config.enforceEntryGroup property to true.

A group secret is a secret defined in the credential store that contains a comma-delimited list of Data Collector user groups permitted to access the associated secret.

When the credential store ID requires group secrets, you must define a group secret for every secret that Data Collector accesses in that credential store. The name of the group secret is based on the secret name, as follows:
<secret name>-groups
When you configure a credential function to call a secret, the user group specified in the credential function must be listed in the associated group secret that is defined in the credential store.
For example, say you work with Control Hub and you enable Data Collector to require group secrets for Azure Key Vault. Then, in a Kafka Multitopic Consumer origin, you use the following expression to access a Base64-encoded keytab in the azure credential store for the origin to use:
${credential:get("azure", "kafkaprod@MyCompany", readkeytab)}
When you run the pipeline, Data Collector validates all of the following:
  • The user who starts the pipeline is in the kafkaprod user group.
  • The readkeytab secret has an associated readkeytab-groups secret defined in the credential store.
  • The readkeytab-groups secret includes the kafkaprod user group.

When Data Collector is not configured to require group secrets, Data Collector validates only the first point, verifying that the user belongs to the specified group.

AWS Secrets Manager

To use the AWS Secrets Manager credential store system, install the AWS Secrets Manager Credentials Store stage library and define the configuration properties used to connect to Secrets Manager. Then, use credential functions in pipeline stage properties to retrieve stored values.

In Secrets Manager, you must configure an access and secret key pair with correct permission to read the key. To follow best practices, make secrets read-only and limit access. See the Secrets Manager documentation on identity and access management (IAM) policies.

Note: This documentation includes Secrets Manager information needed for the configuration process. For more information, see the AWS Secrets Manager documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the AWS Secrets Manager Credentials Store stage library. The core installation does not include the library.

To verify that Data Collector has the AWS Secrets Manager Credentials Storestage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Secret Manager credential store.

Step 2. Configure Credential Store Properties

To enable Data Collector to connect to the AWS Secrets Manager credential store, configure the Secrets Manager properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=aws
  1. Uncomment the credentialStores property and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Secrets Manager, set the value to aws.

    To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Secrets Manager credential store, set the value to jks,aws. To use multiple Secrets Manager credential stores, simply specify separate IDs for each, such as awsDev,awsProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, aws, leave the property names intact, and simply configure the properties.

    To use multiple AWS Secrets Manager credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.

    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    These properties are grouped in the AWS Secrets Manager section of the file:

    Secrets Manager Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the AWS Secrets Manager credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.nameKey.separator Optional. Separator to use in the name argument that credential functions use. Use the following format for the name argument:

    <name><separator><key>

    For example, if you keep the default ampersand (&), the format for the name argument is: <name>&<key>

    Note: In Secrets Manager, names can contain alphanumeric and the following special characters: / _ + = . @ - . Therefore, avoid using those characters as separators.
    credentialStore.<cstore ID>.config.region Required. AWS region that hosts Secrets Manager. For a list of available regions, see the AWS Region Table.
    credentialStore.<cstore ID>.config.security.method Required. Authentication method used to connect to AWS. Set to one of the following values:
    • instanceProfile - Authenticates using an instance profile associated with Data Collector.

      Use when Data Collector runs on an Amazon EC2 instance that has an associated instance profile. Data Collector uses the instance profile credentials to automatically authenticate with AWS.

    • accessKeys - Authenticates using an AWS access key pair.

      Use when Data Collector does not run on an Amazon EC2 instance or when the EC2 instance doesn’t have an instance profile.

    credentialStore.<cstore ID>.config.access.key Required when using access keys to authenticate with AWS. AWS access key ID.
    credentialStore.<cstore ID>.config.secret.key Required when using access keys to authenticate with AWS. AWS secret access key.
    credentialStore.<cstore ID>.config.cache.max.size Optional. Maximum number of secrets Data Collector can cache locally. Default is 1024.
    credentialStore.<cstore ID>.config.cache.ttl.millis Optional. Number of milliseconds that Data Collector considers a cached secret valid before requiring a refresh. Default is 1 hour.
    credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret.

    When set to true, each secret must have a corresponding <secret key name>-groups secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.

    For more information, see Group Access to Secrets.

    Default is false.

  3. Restart Data Collector to enable the changes.

Step 3. Call Secrets from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve secrets from AWS Secrets Manager.

Use credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property.
The credential functions use the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Name of the secret to retrieve from Secrets Manager. Use the following format: "<name><separator><key>", where:
    • <name> is the secret name.
    • <separator> is the separator defined either in the $SDC_CONF/credential-stores.properties file or in the function call.
    • <key> is the key for the value that you want returned.
  • storeOptions - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For Secret Manager, you can use the following options:
    • separator - Specifies the separator for name and key values in the credential functions, overriding the credentialStore.aws.config.nameKey.separator property.
    • alwaysRefresh - When set to true, forces the key to refresh its cached value before Data Collector retrieves the value, overriding the credentialStore.aws.config.cache.ttl.millis property. Be aware that always refreshing the cached value significantly increases the pipeline run time.
    Use the following format to specify options:
    "<option1>=<value>,<option2>=<value>"
    For example, to use the pipe symbol ( | ) as the separator, enter the following for the options argument:
    "separator=|"
For example, the following expression returns the value from the key SQLk1 of the secret SQLpassword from the aws credential store. The expression allows any user in the devops group to access the key when validating, previewing, or running the pipeline:
${credential:get("aws", "devops@MyCompany", "SQLpassword&SQLk1")}
The following expression returns the same key value, but overrides the separator to use a pipe:
${credential:getWithOptions("aws", "devops@MyCompany", "SQLpassword|SQLk1", "separator=|")}

CyberArk

To use the CyberArk credential store system, install the CyberArk Credential Store stage library and define the configuration properties used to connect to CyberArk Application Identity Manager. Then, use credential functions in pipeline stage properties to retrieve stored values.

At this time, CyberArk integration is only supported using web services to the CyberArk Central Credential Provider.
Note: This documentation includes details about CyberArk to simplify the configuration process. For more information, see the CyberArk documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the CyberArk Credential Store stage library. The core installation does not include the library.

To verify that a Data Collector has the CyberArk Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the CyberArk credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the CyberArk credential store, configure the CyberArk properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=cyberark
  1. Uncomment the credentialStores property and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single CyberArk credential store, set the value to cyberark.

    To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a CyberArk credential store, set the value to jks,cyberark. To use multiple CyberArk credential stores, simply specify separate IDs for each, such as cyberarkDev,cyberarkProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, cyberark, leave the property names intact, and simply configure the properties.

    To use multiple CyberArk credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.
    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    These properties are grouped in the CyberArk section of the file:

    CyberArk Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the CyberArk credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.credential.refresh.millis Optional. Number of milliseconds that Data Collector locally caches a credential. When the time expires, Data Collector retrieves the credential from CyberArk.
    credentialStore.<cstore ID>.config.credential.retry.millis Optional. Number of milliseconds that Data Collector waits before attempting to retry a retrieval of a credential from CyberArk, in the case of an error.
    credentialStore.<cstore ID>.config.connector Optional. Connector type to CyberArk. Leave the default, webservices, since only web services is currently supported.
    credentialStore.<cstore ID>.config.ws.url Required. CyberArk Central Credential Provider web service URL.

    Use the following format:

    https://<host name>:<port>/AIMWebService/api/Accounts
    credentialStore.<cstore ID>.config.ws.appId Required. CyberArk application ID for this Data Collector. You must create the application ID in CyberArk.
    credentialStore.<cstore ID>.config.ws.maxConcurrentConnections Optional. Maximum number of concurrent web service calls that Data Collector can make to CyberArk.
    credentialStore.<cstore ID>.config.ws.validateAfterInactivity.millis Optional. Number of milliseconds of inactivity before Data Collector validates the HTTP connection to CyberArk.
    credentialStore.<cstore ID>.config.ws.connectionTimeout.millis Optional. Number of milliseconds to wait for a connection to CyberArk.
    credentialStore.<cstore ID>.config.ws.nameSeparator Optional. Separator to use in the name argument that credential functions use.
    Use the following format for the name argument:
    <safe><separator><folder><separator><object name><separator><element name>
    For example, if you keep the default ampersand (&), the format for the name argument is:
    <safe>&<folder>&<object name>&<element name>
    credentialStore.<cstore ID>.config.ws.http.authentication Optional. Authentication type used by the CyberArk Central Credential Provider web services: none, basic, or digest.

    Default is none.

    credentialStore.<cstore ID>.config.ws.http.authentication.user Optional. Username if using basic or digest authentication.
    credentialStore.<cstore ID>.config.ws.http.authentication.password Optional. Password if using basic or digest authentication.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

    credentialStore.<cstore ID>.config.ws.truststoreFile Optional. Path to the truststore file if using HTTPS and the server certificate is using a private CA or is not trusted by the Java default truststore file.

    Enter a path relative to the Data Collector configuration directory, $SDC_CONF, or enter an absolute path.

    credentialStore.<cstore ID>.config.ws.truststorePassword Optional. Password for the truststore file.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

    credentialStore.<cstore ID>.config.ws.supportedProtocols Optional. SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended.
    credentialStore.<cstore ID>.config.ws.hostnameVerifier.skip Optional. Determines whether the host name of the CyberArk Central Credential Provider web services should be verified against the domain defined in the HTTPS certificate.

    By default, the host name is verified.

    credentialStore.<cstore ID>.config.ws.keystoreFile Optional. If using HTTPS and the CyberArk Central Credential Provider web services requires client side certificates, the path to the keystore file that contains the client certificate.

    Enter a path relative to the Data Collector configuration directory, $SDC_CONF, or enter an absolute path.

    credentialStore.<cstore ID>.config.ws.keystorePassword Optional. Password for the keystore file.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

    credentialStore.<cstore ID>.config.ws.keyPassword Optional. Password to access the certificate within the keystore file.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

    credentialStore.<cstore ID>.config.ws.proxyURI Optional. URI for the proxy that should be used to reach the CyberArk services.
    credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret.

    When set to true, each secret must have a corresponding <secret key name>-groups secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.

    For more information, see Group Access to Secrets.

    Default is false.

  3. Restart Data Collector to enable the changes.

Step 3. Call Secrets from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve secrets from CyberArk.

Use the credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property.
The credential functions use the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Name of the secret to retrieve from CyberArk. Use the following format: "<safe><separator><folder><separator><object name>[<separator><element name>]", where:
    • <safe> is the CyberArk safe to read. For example, production.
    • <separator> is the separator defined for the safe, folder, object name, and element name values in the $SDC_CONF/credential-stores.properties file. Or if you use the credential:getWithOptions() function, you can define the separator in the options argument.
    • <folder> is the folder in CyberArk to read. For example, Root\\sqldatabases.
    • <object name> is the object or secret in CyberArk to read. For example, payroll.
    • <element name> is an optional name for the value in the secret that you want returned. For example, enter Content to return the password or Username to return an optional user name value. If you do not specify <element name>, Data Collector uses Content.
  • storeOptions - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For CyberArk, you can use the following options:
    • separator - Separator to use in the name argument.
    • ConnectionTimeout - Connection timeout value in milliseconds.
    • FailRequestOnPasswordChange - Whether to fail the request on a password change, set to true or false. See the CyberArk documentation for details on this option.
    Use the following format to specify options:
    "<option1>=<value>,<option2>=<value>"
    For example, to use the pipe symbol (|) as the separator, enter the following for the options argument:
    "separator=|"
For example, the following expression returns the password for the payroll secret stored in the Root\\sqldatabases folder in the production safe of the cyberark credential store. The name argument uses the default ampersand (&) as the separator. The expression allows any user belonging to the devops group access to the secret when validating, previewing, or running the pipeline:
${credential:get("cyberark", "devops@MyCompany", "production&Root\\sqldatabases&payroll&Content")}
The following expression returns the same password, but specifies the pipe symbol (|) as the separator:
${credential:getWithOptions("cyberark", "devops@MyCompany", "production|Root\\sqldatabases|payroll|Content", "separator=|")}

Google Secret Manager

To use a Google Secret Manager credential store system, install the Google Secret Manager Credentials Store stage library and define the configuration properties used to connect to Secret Manager. Then, use a credential function in pipeline stage properties to retrieve stored values.

As a best practice, make secrets read-only and limit access. For additional suggestions, see the Google Secret Manager best practices documentation.

Note: This documentation includes Secret Manager information needed for the configuration process. For more information about Secret Manager, see the Google Secret Manager documentation.

Authentication

Data Collector must authenticate with Google Secret Manager using Google credentials.

When you configure the credential store properties, you configure Data Collector to use one of the following credential modes:

Default
Data Collector authenticates with Google Secret Manager using the credentials file defined in the GOOGLE_APPLICATION_CREDENTIALS environment variable.
Set the environment variable on the Data Collector machine. If you run Data Collector on a VM on Google Cloud Platform, use an instance service account with access to Google Secret Manager.

For more information about using default credentials, see the Google Cloud documentation.

JSON
Data Collector authenticates with Google Secret Manager using JSON-formatted credential information specified in the credential store configuration properties. You copy the JSON content from a Google Cloud service account credentials file.
Enter the JSON content in plain text. If the content includes multiple lines of text, add a backslash (\) at the end of each line.
JSON Path
Data Collector authenticates with Google Secret Manager using a Google Cloud service account credentials file stored on the Data Collector machine.

Enter the path to the file in the credential store configuration properties. Enter a path relative to the Data Collector resources directory, $SDC_RESOURCES, or enter an absolute path.

For information about generating a service account credential file, see the Google Cloud Platform documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Google Secret Manager Credentials Store stage library. The core installation does not include the library.

To verify that Data Collector has the Google Secret Manager Credentials Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Secret Manager credential store.

Step 2. Configure Credential Store Properties

To enable Data Collector to connect to the Google Secret Manager credential store, configure the Secret Manager properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=gcp
  1. Uncomment the credentialStores property and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Secret Manager, set the value to gcp.

    To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Secret Manager credential store, set the value to jks,gcp. To use multiple Secret Manager credential stores, simply specify separate IDs for each, such as gcpDev,gcpProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, gcp, leave the property names intact, and simply configure the properties.

    To use multiple Google Secret Manager credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.

    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    These properties are grouped in the Google Secret Manager section of the file:

    Secret Manager Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the Google Secret Manager credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.cache.inactivityExpiration.millis Expiration time for the cache in milliseconds.

    Default is 1800000.

    credentialStore.<cstore ID>.config.delimiter Delimiter to use in the credential function name argument to separate the secret name and the version ID. Use a single character that is not included in credential names.

    Use the following format for the name argument:

    <name><delimiter><version id>

    For example, if you use a slash, the format for the name argument is:

    <name>/<version id>

    Default is question mark (?).

    credentialStore.<cstore ID>.config.project.id ID of the project associated with the Secret Manager.
    credentialStore.<cstore ID>.config.credentialsMode Credentials to use for authentication with Secret Manager:
    • default - Uses Google Cloud default credentials.
    • json - Uses JSON-formatted credentials information specified in the credential store configuration properties.
    • jsonPath - Uses a JSON service account credentials file stored on the Data Collector machine.

    For more information, see Authentication.

    credentialStore.<cstore ID>.config.credentialsJson Contents of a Google Cloud service account credentials file.

    Enter JSON-formatted credential information in plain text. If the content includes multiple lines of text, add a backslash (\) at the end of each line.

    Required when using the json credentials mode.

    credentialStore.<cstore ID>.config.credentialsJsonPath Path to a Google Cloud service account credentials file stored on the Data Collector machine. The credentials file must be a JSON file.

    Enter a path relative to the Data Collector resources directory, $SDC_RESOURCES, or enter an absolute path.

    Required when using the jsonPath credentials mode.

    credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret.

    When set to true, each secret must have a corresponding <secret key name>-groups secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.

    For more information, see Group Access to Secrets.

    Default is false.

  3. Restart Data Collector to enable the changes.

Step 3: Call Secrets from the Pipeline

Use the credential:get() function in pipeline stage properties to retrieve secrets from Google Secret Manager.

Use the credential function in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property.
The credential function uses the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Secret to retrieve from Secret Manager. Use the following format: "<name><delimiter><version ID>", where:
    • <name> is the secret name.
    • <delimiter> is the delimiter defined in the $SDC_CONF/credential-stores.properties file.
    • <version ID> is the version of the value that you want returned.
For example, the following expression returns the latest version of the user1pass secret from the gcs credential store. The expression allows any user in the devops group to access the key when validating, previewing, or running the pipeline:
${credential:get("gcs", "devops@MyCompany", "user1pass?latest")}

Hashicorp Vault

To use the Hashicorp Vault credential store system, install the Vault Credential Store stage library and define the configuration properties used to connect to Hashicorp Vault. Then, use credential functions in pipeline stage properties to retrieve stored values.

Note: This documentation includes details about Hashicorp Vault to simplify the configuration process. For more information, see the Vault documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Vault Credential Store stage library. The core installation does not include the library.

To verify that a Data Collector has the Vault Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Hashicorp Vault credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Hashicorp Vault credential store, configure the Hashicorp Vault properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=vault
  1. Uncomment the credentialStores property and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Hashicorp Vault credential store, set the value to vault.

    To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Hashicorp Vault credential store, set the value to jks,vault. To use multiple Hashicorp Vault credential stores, simply specify separate IDs for each, such as vaultDev,vaultProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, vault, leave the property names intact, and simply configure the properties.

    To use multiple Hashicorp Vault credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.
    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    These properties are grouped in the Hashicorp Vault section of the file:

    Vault Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the Vault credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.pathKey.separator Optional. Separator to use in the name argument that credential functions use.

    Use the following format for the name argument:

    <path><separator><key>
    For example, if you keep the default ampersand (&), the format for the name argument is:
    <path>&<key>
    credentialStore.<cstore ID>.config.addr Required. Vault server URL entered in the following format:
    https://<host name>:<port number>

    Use HTTPS to avoid unencrypted communication.

    credentialStore.<cstore ID>.config.authMethod Required. Authentication method that Data Collector uses to authenticate with Vault.
    Specify one of the following authentication methods:
    • appId
    • appRole
    • azure
    Important: The App ID authentication backend has been deprecated by Hashicorp and will be removed in a future release. As a result, do not use App ID authentication for new installations.

    Default is appRole.

    credentialStore.<cstore ID>.config.role.id Required for App Role authentication. Vault Role ID that Data Collector uses to authenticate with Vault. The Role ID is configured within Vault by your Vault administrator.

    The Data Collector Vault integration relies on Vault's App Role authentication backend.

    credentialStore.<cstore ID>.config.secret.id Required for App Role authentication. Vault Secret ID that Data Collector uses to authenticate with Vault. The Secret ID is configured within Vault by your Vault administrator.

    To protect the Secret ID, store the Secret ID in an external location and then use a function to retrieve the Secret ID.

    Default uses the file function to retrieve the Secret ID from vault-secret-id in the $SDC_CONF directory.

    credentialStore.<cstore ID>.config.azure.role Required for Azure authentication. Name of the Vault role defined for Data Collector.
    credentialStore.<cstore ID>.config.azure.subscriptionId Required for Azure authentication. Subscription ID of the Azure subscription where Data Collector is hosted.
    credentialStore.<cstore ID>.config.azure.resourceGroupName Required for Azure authentication. Name of the resource group defined in the Vault role for Data Collector.
    credentialStore.<cstore ID>.config.azure.vmName Required for Azure authentication. Name of the Azure VM where Data Collector is running.
    credentialStore.<cstore ID>.config.azure.resource Required for Azure authentication. Name of the resource defined in the Azure authentication configuration.
    credentialStore.<cstore ID>.config.app.id

    Deprecated. App ID for App ID authentication.

    Important: The App ID authentication backend has been deprecated by Hashicorp and will be removed in a future release. As a result, do not configure this property for new installations.
    credentialStore.<cstore ID>.config.lease.renewal.interval.sec Optional. Seconds to wait before checking for leases that need renewal.

    Default is 60.

    credentialStore.<cstore ID>.config.lease.expiration.buffer.sec Optional. Buffer for expiring leases. Data Collector renews leases that expire in less than the specified number of seconds.

    Default is 120.

    credentialStore.<cstore ID>.config.open.timeout Optional. Timeout to establish an HTTP connection to Vault in milliseconds.

    Default is 0 for no limit.

    credentialStore.<cstore ID>.config.proxy.address Optional. Proxy URL. Configure to use a proxy to access Vault.
    credentialStore.<cstore ID>.config.proxy.port Optional. Proxy port. Configure to use a proxy to access Vault.
    credentialStore.<cstore ID>.config.proxy.username Optional. Proxy username. Configure to use a proxy to access Vault.
    credentialStore.<cstore ID>.config.proxy.password Optional. Proxy password. Configure to use a proxy to access Vault.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

    credentialStore.<cstore ID>.config.read.timeout Optional. Milliseconds to wait for data before timing out.

    Default is 0 for no limit.

    credentialStore.<cstore ID>.config.ssl.enabled.protocols Optional. SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended.

    Default is TLSv1.2,TLSv1.3.

    credentialStore.<cstore ID>.config.ssl.truststore.file Optional. Path to a Java truststore file. Required when using a private CA or certificates not trusted by the Java default truststore.
    credentialStore.<cstore ID>.config.ssl.truststore.password Optional. Password for the truststore file.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

    credentialStore.<cstore ID>.config.ssl.verify Optional. Whether to verify that the Vault server hostname matches its certificate.

    Default is true. False is not recommended.

    credentialStore.<cstore ID>.config.ssl.timeout Optional. Timeout for the SSL/TLS handshake in milliseconds.

    Default is 0 for no limit.

    credentialStore.<cstore ID>.config.timeout Optional. Timeout to read from Vault in milliseconds, after a connection has been established.

    Default is 0 for no limit.

    credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret.

    When set to true, each secret must have a corresponding <secret key name>-groups secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.

    For more information, see Group Access to Secrets.

    Default is false.

  3. Restart Data Collector to enable the changes.

Step 3. Call Secrets from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve secrets from Hashicorp Vault.

Use the credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property.
The credential functions use the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Name of the secret to retrieve from Hashicorp Vault. Use the following format: "<path><separator><key>", where:
    • <path> is the path in Vault to read.
    • <separator> is the separator defined for the path and key values in the $SDC_CONF/credential-stores.properties file.
    • <key> is the key for the secret that you want returned.
  • storeOptions - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For Hashicorp Vault, you can enter a delay in milliseconds to allow time for external processing. Use the delay option when using the Vault AWS secret backend to generate AWS access credentials based on IAM policies. According to Vault documentation, you might need a delay of 10 seconds or more before the credentials can be used successfully.

    Use the following format to specify an option:

    "<option>=<option>"
    For example, to set the Vault delay to 1,000 milliseconds, enter the following for the options argument:
    "delay=1000"
For example, the following expression returns the value of the key password stored in the Vault path /secret/databases/oracle from the vault credential store after waiting for a delay of 1,000 milliseconds. The name argument uses the default ampersand (&) as the separator. The expression allows any user belonging to the devops group access to the secret when validating, previewing, or running the pipeline:
${credential:getWithOptions("vault", "devops@MyCompany""devops@9a213-b18-1eb-b9c-15ad68", "/secret/databases/oracle&password", "delay=1000")}

Java Keystore

To use the Java keystore credential store system, install the Java Keystore Credential Store stage library and define the configuration properties used to connect to the credential store.

Use the stagelib-cli jks-credentialstore command to add credentials to the credential store. Then, use credential functions in pipeline stage properties to retrieve stored values.
Important: Use the Java keystore credential store system in development environments only. In a production environment, use one of the other supported credential stores.

A Java keystore credential storage system requires the distribution of a keystore file, which complicates security. Before using a Java keystore system, decide how the keystore will be distributed and consult with your IT security team to ensure that the system meets IT policies.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Java Keystore Credential Store stage library. The core installation does not include the library.

To verify that a Data Collector has the Java Keystore Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Java keystore credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Java keystore credential store, configure the Java keystore properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=jks
  1. Uncomment the credentialStores property and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Java keystore credential store, set the value to jks.

    To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Hashicorp Vault credential store, set the value to jks,vault. To use multiple Java keystore credential stores, simply specify separate IDs for each, such as jksDev1,jksDev2.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, jks, leave the property names intact, and simply configure the properties.

    To use multiple Java keystore credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.
    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.

    These properties are grouped in the Java keystore section of the file:

    Java Keystore Property Description
    credentialStore.<cstore ID>.def Defines the implementation of the Java Keystore credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.keystore.type Format of the Java keystore file:
    • JCEKS
    • PKCS12

    Default is PKCS12.

    credentialStore.<cstore ID>.config.keystore.file Path and name of the Java keystore file. Enter an absolute path to the file, or a path relative to the Data Collector configuration directory, $SDC_CONF.

    Default is jks-credentialStore.pkcs12.

    credentialStore.<cstore ID>.config.keystore.storePassword Password that Data Collector uses to access the Java keystore file.

    You must change the default value before using the keystore file.

    To protect the password, store the password in an external location and then use a function to retrieve the password.

    credentialStore.<cstore ID>.config.keystore.file.min.refresh.millis Milliseconds that Data Collector waits before reloading the keystore file.

    Default is 10000, or ten seconds.

  3. Restart Data Collector to enable the changes.

Step 3. Add Secrets to the Credential Store

Use the stagelib-cli jks-credentialstore command to add secrets to the Java keystore file. You can add multiple secrets to the file.

Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore add -i <cstore ID> -n <secret name> -c <secret value>
For example, the following command adds a secret named OracleDBPassword with the value 278yT6u to the devjks Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore add -i devjks -n OracleDBPassword -c 278yT6u
Note: The stagelib-cli jks-credentialstore command also includes delete and list subcommands that you use to manage the secrets defined in the keystore file. For information on using these commands, see jks-credentialstore Command.

Step 4. Call Secrets from the Pipeline

Use the credential:get() function in pipeline stage properties to retrieve secrets from the Java keystore.

Use the credential function in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property.
The credential:get() function uses the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Name of the secret to retrieve from the credential store.
For example, the following expression returns the value of the OracleDBPassword secret from the devjks credential store. It also allows any user belonging to the devops group access to the secret when validating, previewing, or running the pipeline:
${credential:get("devjks", "devops@MyCompany", "OracleDBPassword")}

jks-credentialstore Command

The stagelib-cli jks-credentialstore command provides subcommands to add, list, and delete secrets in the Java keystore credential store.

Any changes made to the Java keystore file take effect immediately. For example, if you change the value of an existing secret in the file, running pipelines that require a new connection to the external system use the updated secret.
Note: In previous releases, the jks-cs command provided the same subcommands to add, list, and delete secrets in the Java keystore credential store. However, the jks-cs command is now deprecated and will be removed in a future release.
You can use the following subcommands with the stagelib-cli jks-credentialstore command:
add
Adds a secret to the Java keystore credential store.
Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore add \
(-i <cstore ID> | --id <cstore ID>) \
(-n <secret name> | --name <secret name>) \
(-c <secret value> | --credential <secret value>)
Add Option Description
-i <cstore ID>

or

--id <cstore ID>

Required. Unique ID for the credential store.

The default ID for a Java keystore is jks.

-n <secret name>

or

--name <secret name>

Required. Name of the secret to add to the Java keystore credential store.

If the name includes non-alphanumeric characters, use single quotation marks around the name.

-c <secret value>

or

--credential <secret value>

Required. Value to add to the Java keystore credential store.

If the value includes non-alphanumeric characters, use single quotation marks around the value.

For example, the following command adds a secret named OracleDBPassword with the value df35yT_&5 to the devjks Java keystore credential store:

bin/streamsets stagelib-cli jks-credentialstore add -i devjks -n OracleDBPassword -c 'df35yT_&5'
delete
Deletes a secret from the Java keystore credential store.
Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore delete \
(-i <cstore ID> | --id <cstore ID>) \
(-n <secret name> | --name <secret name>)
Delete Option Description
-i <cstore ID>

or

--id <cstore ID>

Required. Unique ID for the credential store.

The default ID for a Java keystore is jks.

-n <secret name>

or

--name <secret name>

Required. Name of the secret to delete from the Java keystore credential store.

If the name includes non-alphanumeric characters, use single quotation marks around the name.

For example, the following command deletes a secret named SQLServerDBPassword from the devjks Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore delete -i devjks -n SQLServerDBPassword
list
Lists the names of all secrets defined in the Java keystore credential store. The command does not list the values.
Use the command from the $SDC_DIST directory as follows:
bin/streamsets stagelib-cli jks-credentialstore list \
(-i <cstore ID> | --id <cstore ID>)
List Option Description
-i <cstore ID>

or

--id <cstore ID>

Required. Unique ID for the credential store.

The default ID for a Java keystore is jks.

For example, the following command lists the names of all secrets defined in the devjks Java keystore credential store:
bin/streamsets stagelib-cli jks-credentialstore list -i devjks

Microsoft Azure Key Vault

Before Data Collector can connect to the Microsoft Azure Key Vault credential store system, you must complete several prerequisites in Azure so that Data Collector can access the Azure Key Vault as an application.

After completing the prerequisites, install the Azure Key Vault Credential Store stage library and define the configuration properties used to connect to Azure Key Vault. Then, use credential functions in pipeline stage properties to retrieve stored values.

Note: This documentation includes details about Azure Key Vault to simplify the configuration process. For more information, see the Azure Key Vault documentation.

Prerequisites

Before Data Collector can connect to the Microsoft Azure Key Vault credential store system, complete the following prerequisites within Azure:

Register Data Collector with Azure Active Directory
Use the Azure portal to register Data Collector as an application in Azure Active Directory. When an application such as Data Collector accesses keys or secrets in an Azure key vault, the application must use an authentication token from Azure Active Directory.
The registration process assigns Data Collector the following values, which you specify when you configure the credential store properties:
  • application ID
  • authentication key
For more information about registering applications in Azure Active Directory, see the Azure Key Vault documentation.
Authorize Data Collector to use keys in the Azure key vault
Use the Azure portal to authorize Data Collector to use the keys, or secrets, in the Azure key vault. Azure Key Vault requires that applications be authorized to access each key vault.
For information about authorizing applications to use keys, see the Azure Key Vault documentation.

Step 1. Install the Credential Store Stage Library

By default, a full Data Collector installation includes the Azure Key Vault Credential Store stage library. The core installation does not include the library.

To verify that a Data Collector has the Azure Key Vault Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Azure Key Vault credential store.

Step 2. Configure the Credential Store Properties

To enable Data Collector to connect to the Azure Key Vault credential store, configure the Azure Key Vault properties in the $SDC_CONF/credential-stores.properties file.

Important: For a Cloudera Manager installation, configure all credential store properties through Cloudera Manager. In Cloudera Manager, select the StreamSets service and then click Configuration. Add a line for each credential store property to the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc.properties field as follows:
credentialStores=azure
  1. Uncomment the credentialStores property and specify the credential store ID to use. Use only alphabetic characters for the credential store ID.

    By default, the property lists a default credential store ID for each type of credential store, aws for AWS Secrets Manager, azure for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.

    To use just a single Azure Key Vault, set the value to azure.

    To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and an Azure Key Vault credential store, set the value to jks,azure. To use multiple Azure Key Vault credential stores, simply specify separate IDs for each, such as azureDev,azureProd.

  2. Uncomment and configure the following properties as needed.

    If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID, azure, leave the property names intact, and simply configure the properties.

    To use multiple Azure Key Vault credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.
    Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.
    These properties are grouped in the Azure Key Vault section of the file:
    Azure Key Vault Property Description
    credentialStore.<cstore ID>.def Required. Defines the implementation of the Azure Key Vault credential store.

    Do not change the default value.

    credentialStore.<cstore ID>.config.credential.refresh.millis Optional. Number of milliseconds that Data Collector locally caches a credential. When the time expires, Data Collector retrieves the credential from Azure Key Vault.
    credentialStore.<cstore ID>.config.credential.retry.millis Optional. Number of milliseconds that Data Collector waits before attempting to retry a retrieval of a credential from Azure Key Vault, in the case of an error.
    credentialStore.<cstore ID>.config.vault.url Required. URL to the key vault created in Azure Key Vault.

    Use the following format:

    https://<key vault name>.vault.azure.net/
    credentialStore.<cstore ID>.config.credential.method Required. Authentication method for Azure Key Vault to use.
    • clientKeys - Use client key authentication.
    • managedIdentity - Use managed identity authentication. To use managed mdentity authentication in Data Collector, you must set up a managed identity in Azure. For information on setting up a managed identity in Azure, see the Microsoft documentation.

    Default is clientKeys.

    credentialStore.<cstore ID>.config.client.id Required to use client key authentication. Application ID assigned to this Data Collector when you registered Data Collector as an application in Azure Active Directory, as described in prerequisites.
    credentialStore.<cstore ID>.config.client.key Required to use client key authentication. Authentication key assigned to this Data Collector when you registered Data Collector as an application in Azure Active Directory, as described in prerequisites.
    credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret.

    When set to true, each secret must have a corresponding <secret key name>-groups secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.

    For more information, see Group Access to Secrets.

    Default is false.

  3. Restart Data Collector to enable the changes.

Step 3. Call Secrets from the Pipeline

Use the credential:get() or credential:getWithOptions() function in pipeline stage properties to retrieve keys or secrets from Azure Key Vault.

Use the credential functions in any stage property that displays the key icon next to it. For example:

Important: When you use a credential function in a stage property, the function must be the only value defined in the property.
The credential functions use the following arguments:
  • cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
  • userGroup - Group that a user must belong to in order to access the secret. Only users that have execute permission on the pipeline and that belong to this group can validate, preview, or run the pipeline that retrieves the secret.

    If working with Control Hub, specify the group using the required naming convention: <group ID>@<organization ID>.

    To grant access to all users, specify the default all group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group using all or all@<organization ID>. StreamSets recommends using all so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.
    Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the default all@<organization ID> group.
  • name - Name of the key or secret to retrieve from Azure Key Vault.
  • storeOptions - Used only by the credential:getWithOptions() function. Additional options to communicate with the credential store. For Azure Key Vault, you can use the following options:
    • url - Overrides the credentialStore.azure.config.vault.url property in the $SDC_CONF/credential-stores.properties file
    • retry - Overrides the credentialStore.azure.config.credential.retry.millis property in the $SDC_CONF/credential-stores.properties file.
    • refresh - Overrides the credentialStore.azure.config.credential.refresh.millis property in the $SDC_CONF/credential-stores.properties file.
    • credentialType=certificate - Instructs Azure Key Vault to retrieve the stored PEM certificate as a certificate rather than as a secret. Use to retrieve a PEM certificate stored in Azure Key Vault when you configure a stage to use a remote keystore or truststore for SSL/TLS encryption.
    Use the following format to specify options:
    "<option1>=<value>,<option2>=<value>"
For example, the following expression returns the value of the SQLpassword secret from the azure credential store. The expression allows any user belonging to the devops group access to the credential when validating, previewing, or running the pipeline:
${credential:get("azure", "devops@MyCompany", "SQLpassword")}
The following expression returns the same secret value, but overrides the retry time configured in the $SDC_CONF/credential-stores.properties file
${credential:getWithOptions("azure", "devops@MyCompany", "SQLpassword", "retry=3000")}