Credential Stores
Data Collector pipeline stages communicate with external systems to read and write data. Many of these external systems require sensitive information, such as user names or passwords, to access the data. When you configure pipeline stages for these external systems, you must specify the details that the stages need to connect to the system.
If you enter sensitive information directly in stage properties, you expose those details to any user with access to the pipeline. To access external systems without exposing the sensitive information, add them as secrets in a credential store and then use Data Collector credential functions in the stage properties to retrieve those values.
Defining secrets in a credential store can make it easier to migrate pipelines to another environment. For example, if you migrate multiple pipelines from a development to a production environment, you do not need to edit each pipeline with details for the production environment. You can simply replace the development credential store with the production version.
You can configure Data Collector to use multiple credential stores at the same time. Each credential store is identified by a unique credential store ID.
Enabling Credential Stores
You can configure Data Collector to use one or more credential stores. Each credential store is identified by a unique credential store ID.
- credentialStores property
- This property defines the credential stores that Data Collector can use.
- usePortableGroups property
- This property allows you to migrate pipelines that access a credential store
from one Control Hub organization to another without updating the pipeline. Important: Use this property only when recommended by the StreamSets Support team.
- Sets of related properties
- Each supported credential store type has a set of related properties. The
property names include the default credential store IDs originally specified
in the
credentialStores
property.
For example, say you want to use two Azure credential stores, azureDev
for development and azureProd
for production. To do this, you specify
the credential store IDs in the credentialStores
property and make a
copy of the related Azure credential store properties, so you have one set for each
credential store.
azureDev
, and you do
the same for azureProd
. The resulting properties might look as follows,
with important changes
highlighted:################################################
# Data Collector Credential Stores #
################################################
credentialStores=azureDev,azureProd
#credentialStores.usePortableGroups=false
############################################################
# azureDev: Azure Key Vault Credential Store Configuration #
############################################################
credentialStore.azureDev.def=streamsets-datacollector-azure-keyvault-credentialstore-lib::com_streamsets_datacollector_credential_azure_keyvault_AzureKeyVaultCredentialStore
credentialStore.azureDev.config.credential.refresh.millis=30000
credentialStore.azureDev.config.credential.retry.millis=15000
credentialStore.azureDev.config.vault.url=https://development.vault.azure.net/
credentialStore.azureDev.config.client.id=devClientID
credentialStore.azureDev.config.client.key=devClientKey
credentialStore.azureDev.config.enforceEntryGroup=false
#############################################################
# azureProd: Azure Key Vault Credential Store Configuration #
#############################################################
credentialStore.azureProd.def=streamsets-datacollector-azure-keyvault-credentialstore-lib::com_streamsets_datacollector_credential_azure_keyvault_AzureKeyVaultCredentialStore
credentialStore.azureProd.config.credential.refresh.millis=30000
credentialStore.azureProd.config.credential.retry.millis=15000
credentialStore.azureProd.config.vault.url=https://production.vault.azure.net/
credentialStore.azureProd.config.client.id=prodClientID
credentialStore.azureProd.config.client.key=prodClientKey
credentialStore.azureProd.config.enforceEntryGroup=false
Group Access to Secrets
As an additional layer of security, you can employ user groups to further limit access to the secrets defined in credential stores.
- Required group argument in credential functions
- Credential functions include a group argument that defines the user group that can access the secret. The group argument ensures that the user who attempts to preview, validate, or start a pipeline that includes a credential function belongs to the group specified in the function. The user must also have execute permission on the pipeline.
- Optional group secrets in the credential store
-
In addition to using the group argument in credential functions, you can configure Data Collector to require group secrets for a credential store.
To require the use of group secrets, in the $SDC_CONF/credential-stores.properties file, set the
credentialStore.<cstore ID>.config.enforceEntryGroup
property totrue
.A group secret is a secret defined in the credential store that contains a comma-delimited list of Data Collector user groups permitted to access the associated secret.
When the credential store ID requires group secrets, you must define a group secret for every secret that Data Collector accesses in that credential store. The name of the group secret is based on the secret name, as follows:
When you configure a credential function to call a secret, the user group specified in the credential function must be listed in the associated group secret that is defined in the credential store.<secret name>-groups
azure
credential store for the origin to
use:${credential:get("azure", "kafkaprod@MyCompany", readkeytab)}
- The user who starts the pipeline is in the
kafkaprod
user group. - The
readkeytab
secret has an associatedreadkeytab-groups
secret defined in the credential store. - The
readkeytab-groups
secret includes thekafkaprod
user group.
When Data Collector is not configured to require group secrets, Data Collector validates only the first point, verifying that the user belongs to the specified group.
AWS Secrets Manager
To use the AWS Secrets Manager credential store system, install the AWS Secrets Manager Credentials Store stage library and define the configuration properties used to connect to Secrets Manager. Then, use credential functions in pipeline stage properties to retrieve stored values.
In Secrets Manager, you must configure an access and secret key pair with correct permission to read the key. To follow best practices, make secrets read-only and limit access. See the Secrets Manager documentation on identity and access management (IAM) policies.
Step 1. Install the Credential Store Stage Library
By default, a full Data Collector installation includes the AWS Secrets Manager Credentials Store stage library. The core installation does not include the library.
To verify that Data Collector has the AWS Secrets Manager Credentials Storestage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Secret Manager credential store.
Step 2. Configure Credential Store Properties
To enable Data Collector to connect to the AWS Secrets Manager credential store, configure the Secrets Manager properties in the $SDC_CONF/credential-stores.properties file.
credentialStores=aws
-
Uncomment the credentialStores
property and specify the credential store ID to use. Use only alphabetic characters
for the credential store ID.
By default, the property lists a default credential store ID for each type of credential store,
aws
for AWS Secrets Manager,azure
for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.To use just a single Secrets Manager, set the value to
aws
.To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Secrets Manager credential store, set the value to jks,aws. To use multiple Secrets Manager credential stores, simply specify separate IDs for each, such as
awsDev,awsProd
. - Uncomment and configure the following properties as
needed.
If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID,
aws
, leave the property names intact, and simply configure the properties.To use multiple AWS Secrets Manager credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.
Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.These properties are grouped in the AWS Secrets Manager section of the file:
Secrets Manager Property Description credentialStore.<cstore ID>.def Required. Defines the implementation of the AWS Secrets Manager credential store. Do not change the default value.
credentialStore.<cstore ID>.config.nameKey.separator Optional. Separator to use in the name
argument that credential functions use. Use the following format for thename
argument:<name><separator><key>
For example, if you keep the default ampersand (
&
), the format for the name argument is:<name>&<key>
Note: In Secrets Manager, names can contain alphanumeric and the following special characters:/ _ + = . @ -
. Therefore, avoid using those characters as separators.credentialStore.<cstore ID>.config.region Required. AWS region that hosts Secrets Manager. For a list of available regions, see the AWS Region Table. credentialStore.<cstore ID>.config.security.method Required. Authentication method used to connect to AWS. Set to one of the following values: instanceProfile
- Authenticates using an instance profile associated with Data Collector.Use when Data Collector runs on an Amazon EC2 instance that has an associated instance profile. Data Collector uses the instance profile credentials to automatically authenticate with AWS.
accessKeys
- Authenticates using an AWS access key pair.Use when Data Collector does not run on an Amazon EC2 instance or when the EC2 instance doesn’t have an instance profile.
credentialStore.<cstore ID>.config.access.key Required when using access keys to authenticate with AWS. AWS access key ID. credentialStore.<cstore ID>.config.secret.key Required when using access keys to authenticate with AWS. AWS secret access key. credentialStore.<cstore ID>.config.cache.max.size Optional. Maximum number of secrets Data Collector can cache locally. Default is 1024. credentialStore.<cstore ID>.config.cache.ttl.millis Optional. Number of milliseconds that Data Collector considers a cached secret valid before requiring a refresh. Default is 1 hour. credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret. When set to true, each secret must have a corresponding
<secret key name>-groups
secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.For more information, see Group Access to Secrets.
Default is false.
- Restart Data Collector to enable the changes.
Step 3. Call Secrets from the Pipeline
Use the credential:get()
or
credential:getWithOptions()
function in pipeline stage properties to
retrieve secrets from AWS Secrets Manager.
Use credential functions in any stage property that displays the key icon next to it. For example:
- cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
- userGroup - Group that a user must belong to in order to
access the secret. Only users that have execute permission on the pipeline and that
belong to this group can validate, preview, or run the pipeline that retrieves the
secret.
If working with Control Hub, specify the group using the required naming convention:
<group ID>@<organization ID>
.To grant access to all users, specify the defaultall
group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group usingall
orall@<organization ID>
. StreamSets recommends usingall
so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the defaultall@<organization ID>
group. - name - Name of the secret to retrieve from Secrets Manager. Use the following
format:
"<name><separator><key>"
, where:<name>
is the secret name.<separator>
is the separator defined either in the $SDC_CONF/credential-stores.properties file or in the function call.<key>
is the key for the value that you want returned.
- storeOptions - Used only by the
credential:getWithOptions()
function. Additional options to communicate with the credential store. For Secret Manager, you can use the following options:separator
- Specifies the separator for name and key values in the credential functions, overriding thecredentialStore.aws.config.nameKey.separator
property.alwaysRefresh
- When set totrue
, forces the key to refresh its cached value before Data Collector retrieves the value, overriding thecredentialStore.aws.config.cache.ttl.millis
property. Be aware that always refreshing the cached value significantly increases the pipeline run time.
Use the following format to specify options:"<option1>=<value>,<option2>=<value>"
For example, to use the pipe symbol ( | ) as the separator, enter the following for the options argument:"separator=|"
SQLk1
of the secret SQLpassword
from the
aws
credential store. The expression allows any user in the
devops
group to access the key when validating, previewing, or
running the
pipeline:${credential:get("aws", "devops@MyCompany", "SQLpassword&SQLk1")}
${credential:getWithOptions("aws", "devops@MyCompany", "SQLpassword|SQLk1", "separator=|")}
CyberArk
To use the CyberArk credential store system, install the CyberArk Credential Store stage library and define the configuration properties used to connect to CyberArk Application Identity Manager. Then, use credential functions in pipeline stage properties to retrieve stored values.
Step 1. Install the Credential Store Stage Library
By default, a full Data Collector installation includes the CyberArk Credential Store stage library. The core installation does not include the library.
To verify that a Data Collector has the CyberArk Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the CyberArk credential store.
Step 2. Configure the Credential Store Properties
To enable Data Collector to connect to the CyberArk credential store, configure the CyberArk properties in the $SDC_CONF/credential-stores.properties file.
credentialStores=cyberark
-
Uncomment the credentialStores
property and specify the credential store ID to use. Use only alphabetic characters
for the credential store ID.
By default, the property lists a default credential store ID for each type of credential store,
aws
for AWS Secrets Manager,azure
for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.To use just a single CyberArk credential store, set the value to
cyberark
.To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a CyberArk credential store, set the value to jks,cyberark. To use multiple CyberArk credential stores, simply specify separate IDs for each, such as
cyberarkDev,cyberarkProd
. - Uncomment and configure the following properties as
needed.
If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID,
To use multiple CyberArk credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.cyberark
, leave the property names intact, and simply configure the properties.Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.These properties are grouped in the CyberArk section of the file:
CyberArk Property Description credentialStore.<cstore ID>.def Required. Defines the implementation of the CyberArk credential store. Do not change the default value.
credentialStore.<cstore ID>.config.credential.refresh.millis Optional. Number of milliseconds that Data Collector locally caches a credential. When the time expires, Data Collector retrieves the credential from CyberArk. credentialStore.<cstore ID>.config.credential.retry.millis Optional. Number of milliseconds that Data Collector waits before attempting to retry a retrieval of a credential from CyberArk, in the case of an error. credentialStore.<cstore ID>.config.connector Optional. Connector type to CyberArk. Leave the default, webservices
, since only web services is currently supported.credentialStore.<cstore ID>.config.ws.url Required. CyberArk Central Credential Provider web service URL. Use the following format:
https://<host name>:<port>/AIMWebService/api/Accounts
credentialStore.<cstore ID>.config.ws.appId Required. CyberArk application ID for this Data Collector. You must create the application ID in CyberArk. credentialStore.<cstore ID>.config.ws.maxConcurrentConnections Optional. Maximum number of concurrent web service calls that Data Collector can make to CyberArk. credentialStore.<cstore ID>.config.ws.validateAfterInactivity.millis Optional. Number of milliseconds of inactivity before Data Collector validates the HTTP connection to CyberArk. credentialStore.<cstore ID>.config.ws.connectionTimeout.millis Optional. Number of milliseconds to wait for a connection to CyberArk. credentialStore.<cstore ID>.config.ws.nameSeparator Optional. Separator to use in the name
argument that credential functions use.Use the following format for thename
argument:<safe><separator><folder><separator><object name><separator><element name>
For example, if you keep the default ampersand (&), the format for the name argument is:<safe>&<folder>&<object name>&<element name>
credentialStore.<cstore ID>.config.ws.http.authentication Optional. Authentication type used by the CyberArk Central Credential Provider web services: none, basic, or digest. Default is none.
credentialStore.<cstore ID>.config.ws.http.authentication.user Optional. Username if using basic or digest authentication. credentialStore.<cstore ID>.config.ws.http.authentication.password Optional. Password if using basic or digest authentication. To protect the password, store the password in an external location and then use a function to retrieve the password.
credentialStore.<cstore ID>.config.ws.truststoreFile Optional. Path to the truststore file if using HTTPS and the server certificate is using a private CA or is not trusted by the Java default truststore file. Enter a path relative to the Data Collector configuration directory, $SDC_CONF, or enter an absolute path.
credentialStore.<cstore ID>.config.ws.truststorePassword Optional. Password for the truststore file. To protect the password, store the password in an external location and then use a function to retrieve the password.
credentialStore.<cstore ID>.config.ws.supportedProtocols Optional. SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended. credentialStore.<cstore ID>.config.ws.hostnameVerifier.skip Optional. Determines whether the host name of the CyberArk Central Credential Provider web services should be verified against the domain defined in the HTTPS certificate. By default, the host name is verified.
credentialStore.<cstore ID>.config.ws.keystoreFile Optional. If using HTTPS and the CyberArk Central Credential Provider web services requires client side certificates, the path to the keystore file that contains the client certificate. Enter a path relative to the Data Collector configuration directory, $SDC_CONF, or enter an absolute path.
credentialStore.<cstore ID>.config.ws.keystorePassword Optional. Password for the keystore file. To protect the password, store the password in an external location and then use a function to retrieve the password.
credentialStore.<cstore ID>.config.ws.keyPassword Optional. Password to access the certificate within the keystore file. To protect the password, store the password in an external location and then use a function to retrieve the password.
credentialStore.<cstore ID>.config.ws.proxyURI Optional. URI for the proxy that should be used to reach the CyberArk services. credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret. When set to true, each secret must have a corresponding
<secret key name>-groups
secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.For more information, see Group Access to Secrets.
Default is false.
- Restart Data Collector to enable the changes.
Step 3. Call Secrets from the Pipeline
Use the credential:get()
or
credential:getWithOptions()
function in pipeline stage properties to
retrieve secrets from CyberArk.
Use the credential functions in any stage property that displays the key icon next to it. For example:
- cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
- userGroup - Group that a user must belong to in order to
access the secret. Only users that have execute permission on the pipeline and that
belong to this group can validate, preview, or run the pipeline that retrieves the
secret.
If working with Control Hub, specify the group using the required naming convention:
<group ID>@<organization ID>
.To grant access to all users, specify the defaultall
group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group usingall
orall@<organization ID>
. StreamSets recommends usingall
so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the defaultall@<organization ID>
group. - name - Name of the secret to retrieve from CyberArk. Use the following format:
"<safe><separator><folder><separator><object name>[<separator><element name>]"
, where:<safe>
is the CyberArk safe to read. For example,production
.<separator>
is the separator defined for the safe, folder, object name, and element name values in the $SDC_CONF/credential-stores.properties file. Or if you use thecredential:getWithOptions()
function, you can define the separator in the options argument.<folder>
is the folder in CyberArk to read. For example,Root\\sqldatabases
.<object name>
is the object or secret in CyberArk to read. For example,payroll
.<element name>
is an optional name for the value in the secret that you want returned. For example, enterContent
to return the password orUsername
to return an optional user name value. If you do not specify<element name>
, Data Collector usesContent
.
- storeOptions - Used only by the
credential:getWithOptions()
function. Additional options to communicate with the credential store. For CyberArk, you can use the following options:separator
- Separator to use in thename
argument.ConnectionTimeout
- Connection timeout value in milliseconds.FailRequestOnPasswordChange
- Whether to fail the request on a password change, set to true or false. See the CyberArk documentation for details on this option.
Use the following format to specify options:"<option1>=<value>,<option2>=<value>"
For example, to use the pipe symbol (|) as the separator, enter the following for the options argument:"separator=|"
cyberark
credential store. The name
argument uses
the default ampersand (&) as the separator. The expression allows any user belonging
to the devops
group access to the secret when validating, previewing,
or running the
pipeline:${credential:get("cyberark", "devops@MyCompany", "production&Root\\sqldatabases&payroll&Content")}
${credential:getWithOptions("cyberark", "devops@MyCompany", "production|Root\\sqldatabases|payroll|Content", "separator=|")}
Google Secret Manager
To use a Google Secret Manager credential store system, install the Google Secret Manager Credentials Store stage library and define the configuration properties used to connect to Secret Manager. Then, use a credential function in pipeline stage properties to retrieve stored values.
As a best practice, make secrets read-only and limit access. For additional suggestions, see the Google Secret Manager best practices documentation.
Authentication
Data Collector must authenticate with Google Secret Manager using Google credentials.
When you configure the credential store properties, you configure Data Collector to use one of the following credential modes:
- Default
- Data Collector authenticates with Google Secret Manager using the credentials file defined
in the
GOOGLE_APPLICATION_CREDENTIALS
environment variable. - JSON
- Data Collector authenticates with Google Secret Manager using JSON-formatted credential information specified in the credential store configuration properties. You copy the JSON content from a Google Cloud service account credentials file.
- JSON Path
- Data Collector authenticates with Google Secret Manager using a Google Cloud service account credentials file stored on the Data Collector machine.
For information about generating a service account credential file, see the Google Cloud Platform documentation.
Step 1. Install the Credential Store Stage Library
By default, a full Data Collector installation includes the Google Secret Manager Credentials Store stage library. The core installation does not include the library.
To verify that Data Collector has the Google Secret Manager Credentials Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Secret Manager credential store.
Step 2. Configure Credential Store Properties
To enable Data Collector to connect to the Google Secret Manager credential store, configure the Secret Manager properties in the $SDC_CONF/credential-stores.properties file.
credentialStores=gcp
-
Uncomment the credentialStores
property and specify the credential store ID to use. Use only alphabetic characters
for the credential store ID.
By default, the property lists a default credential store ID for each type of credential store,
aws
for AWS Secrets Manager,azure
for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.To use just a single Secret Manager, set the value to
gcp
.To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Secret Manager credential store, set the value to jks,gcp. To use multiple Secret Manager credential stores, simply specify separate IDs for each, such as
gcpDev,gcpProd
. - Uncomment and configure the following properties as
needed.
If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID,
gcp
, leave the property names intact, and simply configure the properties.To use multiple Google Secret Manager credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.
Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.These properties are grouped in the Google Secret Manager section of the file:
Secret Manager Property Description credentialStore.<cstore ID>.def Required. Defines the implementation of the Google Secret Manager credential store. Do not change the default value.
credentialStore.<cstore ID>.config.cache.inactivityExpiration.millis Expiration time for the cache in milliseconds. Default is 1800000.
credentialStore.<cstore ID>.config.delimiter Delimiter to use in the credential function name
argument to separate the secret name and the version ID. Use a single character that is not included in credential names.Use the following format for the
name
argument:<name><delimiter><version id>
For example, if you use a slash, the format for the name argument is:
<name>/<version id>
Default is question mark (?).
credentialStore.<cstore ID>.config.project.id ID of the project associated with the Secret Manager. credentialStore.<cstore ID>.config.credentialsMode Credentials to use for authentication with Secret Manager: default
- Uses Google Cloud default credentials.json
- Uses JSON-formatted credentials information specified in the credential store configuration properties.jsonPath
- Uses a JSON service account credentials file stored on the Data Collector machine.
For more information, see Authentication.
credentialStore.<cstore ID>.config.credentialsJson Contents of a Google Cloud service account credentials file. Enter JSON-formatted credential information in plain text. If the content includes multiple lines of text, add a backslash (\) at the end of each line.
Required when using the
json
credentials mode.credentialStore.<cstore ID>.config.credentialsJsonPath Path to a Google Cloud service account credentials file stored on the Data Collector machine. The credentials file must be a JSON file. Enter a path relative to the Data Collector resources directory,
$SDC_RESOURCES
, or enter an absolute path.Required when using the
jsonPath
credentials mode.credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret. When set to true, each secret must have a corresponding
<secret key name>-groups
secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.For more information, see Group Access to Secrets.
Default is false.
- Restart Data Collector to enable the changes.
Step 3: Call Secrets from the Pipeline
Use the credential:get()
function in pipeline stage properties to
retrieve secrets from Google Secret Manager.
Use the credential function in any stage property that displays the key icon next to it. For example:
- cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
- userGroup - Group that a user must belong to in order to
access the secret. Only users that have execute permission on the pipeline and that
belong to this group can validate, preview, or run the pipeline that retrieves the
secret.
If working with Control Hub, specify the group using the required naming convention:
<group ID>@<organization ID>
.To grant access to all users, specify the defaultall
group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group usingall
orall@<organization ID>
. StreamSets recommends usingall
so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the defaultall@<organization ID>
group. - name - Secret to retrieve from Secret Manager. Use the following format:
"<name><delimiter><version ID>"
, where:<name>
is the secret name.<delimiter>
is the delimiter defined in the $SDC_CONF/credential-stores.properties file.<version ID>
is the version of the value that you want returned.
user1pass
secret from the gcs
credential store.
The expression allows any user in the devops
group to access the key
when validating, previewing, or running the
pipeline:${credential:get("gcs", "devops@MyCompany", "user1pass?latest")}
Hashicorp Vault
To use the Hashicorp Vault credential store system, install the Vault Credential Store stage library and define the configuration properties used to connect to Hashicorp Vault. Then, use credential functions in pipeline stage properties to retrieve stored values.
Step 1. Install the Credential Store Stage Library
By default, a full Data Collector installation includes the Vault Credential Store stage library. The core installation does not include the library.
To verify that a Data Collector has the Vault Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Hashicorp Vault credential store.
Step 2. Configure the Credential Store Properties
To enable Data Collector to connect to the Hashicorp Vault credential store, configure the Hashicorp Vault properties in the $SDC_CONF/credential-stores.properties file.
credentialStores=vault
-
Uncomment the credentialStores
property and specify the credential store ID to use. Use only alphabetic characters
for the credential store ID.
By default, the property lists a default credential store ID for each type of credential store,
aws
for AWS Secrets Manager,azure
for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.To use just a single Hashicorp Vault credential store, set the value to
vault
.To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Hashicorp Vault credential store, set the value to jks,vault. To use multiple Hashicorp Vault credential stores, simply specify separate IDs for each, such as
vaultDev,vaultProd
. - Uncomment and configure the following properties as
needed.
If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID,
To use multiple Hashicorp Vault credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.vault
, leave the property names intact, and simply configure the properties.Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.These properties are grouped in the Hashicorp Vault section of the file:
Vault Property Description credentialStore.<cstore ID>.def Required. Defines the implementation of the Vault credential store. Do not change the default value.
credentialStore.<cstore ID>.config.pathKey.separator Optional. Separator to use in the name
argument that credential functions use.Use the following format for the
name
argument:<path><separator><key>
For example, if you keep the default ampersand (&), the format for the name argument is:<path>&<key>
credentialStore.<cstore ID>.config.addr Required. Vault server URL entered in the following format: https://<host name>:<port number>
Use HTTPS to avoid unencrypted communication.
credentialStore.<cstore ID>.config.authMethod Required. Authentication method that Data Collector uses to authenticate with Vault. Specify one of the following authentication methods:- appId
- appRole
- azure
Important: The App ID authentication backend has been deprecated by Hashicorp and will be removed in a future release. As a result, do not use App ID authentication for new installations.Default is appRole.
credentialStore.<cstore ID>.config.role.id Required for App Role authentication. Vault Role ID that Data Collector uses to authenticate with Vault. The Role ID is configured within Vault by your Vault administrator. The Data Collector Vault integration relies on Vault's App Role authentication backend.
credentialStore.<cstore ID>.config.secret.id Required for App Role authentication. Vault Secret ID that Data Collector uses to authenticate with Vault. The Secret ID is configured within Vault by your Vault administrator. To protect the Secret ID, store the Secret ID in an external location and then use a function to retrieve the Secret ID.
Default uses the
file
function to retrieve the Secret ID from vault-secret-id in the$SDC_CONF
directory.credentialStore.<cstore ID>.config.azure.role Required for Azure authentication. Name of the Vault role defined for Data Collector. credentialStore.<cstore ID>.config.azure.subscriptionId Required for Azure authentication. Subscription ID of the Azure subscription where Data Collector is hosted. credentialStore.<cstore ID>.config.azure.resourceGroupName Required for Azure authentication. Name of the resource group defined in the Vault role for Data Collector. credentialStore.<cstore ID>.config.azure.vmName Required for Azure authentication. Name of the Azure VM where Data Collector is running. credentialStore.<cstore ID>.config.azure.resource Required for Azure authentication. Name of the resource defined in the Azure authentication configuration. credentialStore.<cstore ID>.config.app.id Deprecated. App ID for App ID authentication.
Important: The App ID authentication backend has been deprecated by Hashicorp and will be removed in a future release. As a result, do not configure this property for new installations.credentialStore.<cstore ID>.config.lease.renewal.interval.sec Optional. Seconds to wait before checking for leases that need renewal. Default is 60.
credentialStore.<cstore ID>.config.lease.expiration.buffer.sec Optional. Buffer for expiring leases. Data Collector renews leases that expire in less than the specified number of seconds. Default is 120.
credentialStore.<cstore ID>.config.open.timeout Optional. Timeout to establish an HTTP connection to Vault in milliseconds. Default is 0 for no limit.
credentialStore.<cstore ID>.config.proxy.address Optional. Proxy URL. Configure to use a proxy to access Vault. credentialStore.<cstore ID>.config.proxy.port Optional. Proxy port. Configure to use a proxy to access Vault. credentialStore.<cstore ID>.config.proxy.username Optional. Proxy username. Configure to use a proxy to access Vault. credentialStore.<cstore ID>.config.proxy.password Optional. Proxy password. Configure to use a proxy to access Vault. To protect the password, store the password in an external location and then use a function to retrieve the password.
credentialStore.<cstore ID>.config.read.timeout Optional. Milliseconds to wait for data before timing out. Default is 0 for no limit.
credentialStore.<cstore ID>.config.ssl.enabled.protocols Optional. SSL/TLS-enabled protocols. Versions TLSv1.2 or later are recommended. Default is TLSv1.2,TLSv1.3.
credentialStore.<cstore ID>.config.ssl.truststore.file Optional. Path to a Java truststore file. Required when using a private CA or certificates not trusted by the Java default truststore. credentialStore.<cstore ID>.config.ssl.truststore.password Optional. Password for the truststore file. To protect the password, store the password in an external location and then use a function to retrieve the password.
credentialStore.<cstore ID>.config.ssl.verify Optional. Whether to verify that the Vault server hostname matches its certificate. Default is true. False is not recommended.
credentialStore.<cstore ID>.config.ssl.timeout Optional. Timeout for the SSL/TLS handshake in milliseconds. Default is 0 for no limit.
credentialStore.<cstore ID>.config.timeout Optional. Timeout to read from Vault in milliseconds, after a connection has been established. Default is 0 for no limit.
credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret. When set to true, each secret must have a corresponding
<secret key name>-groups
secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.For more information, see Group Access to Secrets.
Default is false.
- Restart Data Collector to enable the changes.
Step 3. Call Secrets from the Pipeline
Use the credential:get()
or
credential:getWithOptions()
function in pipeline stage properties to
retrieve secrets from Hashicorp Vault.
Use the credential functions in any stage property that displays the key icon next to it. For example:
- cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
- userGroup - Group that a user must belong to in order to
access the secret. Only users that have execute permission on the pipeline and that
belong to this group can validate, preview, or run the pipeline that retrieves the
secret.
If working with Control Hub, specify the group using the required naming convention:
<group ID>@<organization ID>
.To grant access to all users, specify the defaultall
group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group usingall
orall@<organization ID>
. StreamSets recommends usingall
so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the defaultall@<organization ID>
group. - name - Name of the secret to retrieve from Hashicorp Vault. Use the following
format:
"<path><separator><key>"
, where:<path>
is the path in Vault to read.<separator>
is the separator defined for the path and key values in the $SDC_CONF/credential-stores.properties file.<key>
is the key for the secret that you want returned.
- storeOptions - Used only by the
credential:getWithOptions()
function. Additional options to communicate with the credential store. For Hashicorp Vault, you can enter a delay in milliseconds to allow time for external processing. Use the delay option when using the Vault AWS secret backend to generate AWS access credentials based on IAM policies. According to Vault documentation, you might need a delay of 10 seconds or more before the credentials can be used successfully.Use the following format to specify an option:
"<option>=<option>"
For example, to set the Vault delay to 1,000 milliseconds, enter the following for the options argument:"delay=1000"
vault
credential store after waiting for a delay of 1,000 milliseconds. The name argument uses
the default ampersand (&) as the separator. The expression allows any user belonging
to the devops
group access to the secret when validating, previewing,
or running the
pipeline:${credential:getWithOptions("vault", "devops@MyCompany""devops@9a213-b18-1eb-b9c-15ad68", "/secret/databases/oracle&password", "delay=1000")}
Java Keystore
To use the Java keystore credential store system, install the Java Keystore Credential Store stage library and define the configuration properties used to connect to the credential store.
stagelib-cli
jks-credentialstore
command to add credentials to the credential store.
Then, use credential functions in pipeline stage properties to retrieve stored
values.A Java keystore credential storage system requires the distribution of a keystore file, which complicates security. Before using a Java keystore system, decide how the keystore will be distributed and consult with your IT security team to ensure that the system meets IT policies.
Step 1. Install the Credential Store Stage Library
By default, a full Data Collector installation includes the Java Keystore Credential Store stage library. The core installation does not include the library.
To verify that a Data Collector has the Java Keystore Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Java keystore credential store.
Step 2. Configure the Credential Store Properties
To enable Data Collector to connect to the Java keystore credential store, configure the Java keystore properties in the $SDC_CONF/credential-stores.properties file.
credentialStores=jks
-
Uncomment the credentialStores
property and specify the credential store ID to use. Use only alphabetic characters
for the credential store ID.
By default, the property lists a default credential store ID for each type of credential store,
aws
for AWS Secrets Manager,azure
for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.To use just a single Java keystore credential store, set the value to
jks
.To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and a Hashicorp Vault credential store, set the value to jks,vault. To use multiple Java keystore credential stores, simply specify separate IDs for each, such as
jksDev1,jksDev2
. - Uncomment and configure the following properties as
needed.
If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID,
To use multiple Java keystore credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.jks
, leave the property names intact, and simply configure the properties.Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.These properties are grouped in the Java keystore section of the file:
Java Keystore Property Description credentialStore.<cstore ID>.def Defines the implementation of the Java Keystore credential store. Do not change the default value.
credentialStore.<cstore ID>.config.keystore.type Format of the Java keystore file: - JCEKS
- PKCS12
Default is PKCS12.
credentialStore.<cstore ID>.config.keystore.file Path and name of the Java keystore file. Enter an absolute path to the file, or a path relative to the Data Collector configuration directory, $SDC_CONF. Default is jks-credentialStore.pkcs12.
credentialStore.<cstore ID>.config.keystore.storePassword Password that Data Collector uses to access the Java keystore file. You must change the default value before using the keystore file.
To protect the password, store the password in an external location and then use a function to retrieve the password.
credentialStore.<cstore ID>.config.keystore.file.min.refresh.millis Milliseconds that Data Collector waits before reloading the keystore file. Default is 10000, or ten seconds.
- Restart Data Collector to enable the changes.
Step 3. Add Secrets to the Credential Store
Use the stagelib-cli jks-credentialstore
command to add secrets to
the Java keystore file. You can add multiple secrets to the file.
bin/streamsets stagelib-cli jks-credentialstore add -i <cstore ID> -n <secret name> -c <secret value>
OracleDBPassword
with the value 278yT6u
to the
devjks
Java keystore credential
store:bin/streamsets stagelib-cli jks-credentialstore add -i devjks -n OracleDBPassword -c 278yT6u
stagelib-cli jks-credentialstore
command also includes
delete
and list
subcommands that you use to manage
the secrets defined in the keystore file. For information on using these commands, see
jks-credentialstore Command. Step 4. Call Secrets from the Pipeline
Use the credential:get()
function in pipeline stage properties to
retrieve secrets from the Java keystore.
Use the credential function in any stage property that displays the key icon next to it. For example:
credential:get()
function uses the following arguments:- cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
- userGroup - Group that a user must belong to in order to
access the secret. Only users that have execute permission on the pipeline and that
belong to this group can validate, preview, or run the pipeline that retrieves the
secret.
If working with Control Hub, specify the group using the required naming convention:
<group ID>@<organization ID>
.To grant access to all users, specify the defaultall
group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group usingall
orall@<organization ID>
. StreamSets recommends usingall
so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the defaultall@<organization ID>
group. - name - Name of the secret to retrieve from the credential store.
OracleDBPassword
secret from the devjks
credential
store. It also allows any user belonging to the devops
group access to
the secret when validating, previewing, or running the
pipeline:${credential:get("devjks", "devops@MyCompany", "OracleDBPassword")}
jks-credentialstore Command
The stagelib-cli jks-credentialstore
command provides subcommands to
add, list, and delete secrets in the Java keystore credential store.
jks-cs
command
provided the same subcommands to add, list, and delete secrets in the Java keystore
credential store. However, the jks-cs
command is now deprecated and
will be removed in a future release. stagelib-cli
jks-credentialstore
command:- add
- Adds a secret to the Java keystore credential store.
- delete
- Deletes a secret from the Java keystore credential store.
- list
- Lists the names of all secrets defined in the Java keystore credential store. The command does not list the values.
Microsoft Azure Key Vault
Before Data Collector can connect to the Microsoft Azure Key Vault credential store system, you must complete several prerequisites in Azure so that Data Collector can access the Azure Key Vault as an application.
After completing the prerequisites, install the Azure Key Vault Credential Store stage library and define the configuration properties used to connect to Azure Key Vault. Then, use credential functions in pipeline stage properties to retrieve stored values.
Prerequisites
Before Data Collector can connect to the Microsoft Azure Key Vault credential store system, complete the following prerequisites within Azure:
- Register Data Collector with Azure Active Directory
- Use the Azure portal to register Data Collector as an application in Azure Active Directory. When an application such as Data Collector accesses keys or secrets in an Azure key vault, the application must use an authentication token from Azure Active Directory.
- Authorize Data Collector to use keys in the Azure key vault
- Use the Azure portal to authorize Data Collector to use the keys, or secrets, in the Azure key vault. Azure Key Vault requires that applications be authorized to access each key vault.
Step 1. Install the Credential Store Stage Library
By default, a full Data Collector installation includes the Azure Key Vault Credential Store stage library. The core installation does not include the library.
To verify that a Data Collector has the Azure Key Vault Credential Store stage library installed, click the Package Manager icon () to display the list of installed stage libraries. If the library is not installed, install the library before configuring the Azure Key Vault credential store.
Step 2. Configure the Credential Store Properties
To enable Data Collector to connect to the Azure Key Vault credential store, configure the Azure Key Vault properties in the $SDC_CONF/credential-stores.properties file.
credentialStores=azure
- Uncomment the credentialStores
property and specify the credential store ID to use. Use only alphabetic characters
for the credential store ID.
By default, the property lists a default credential store ID for each type of credential store,
aws
for AWS Secrets Manager,azure
for Azure Key Vault, and so on. When using one credential store of any type, it's simplest to use the default value.To use just a single Azure Key Vault, set the value to
azure
.To enable multiple credential stores, specify a comma-separated list of credential store IDs. For example, to use a Java keystore and an Azure Key Vault credential store, set the value to jks,azure. To use multiple Azure Key Vault credential stores, simply specify separate IDs for each, such as
azureDev,azureProd
. - Uncomment and configure the following properties as
needed.
If you specified a custom credential store ID, update the names of the following properties, and then configure them as needed. When using the default credential store ID,
To use multiple Azure Key Vault credential stores, make a copy of the properties for each credential store. Then, update the credential store ID in each set of property names before defining the properties. For an example, see Enabling Credential Stores.azure
, leave the property names intact, and simply configure the properties.Important: Instead of entering sensitive data such as passwords in clear text in the configuration file, you can protect the sensitive data by storing the data in an external location and then using functions to retrieve the data.Azure Key Vault Property Description credentialStore.<cstore ID>.def Required. Defines the implementation of the Azure Key Vault credential store. Do not change the default value.
credentialStore.<cstore ID>.config.credential.refresh.millis Optional. Number of milliseconds that Data Collector locally caches a credential. When the time expires, Data Collector retrieves the credential from Azure Key Vault. credentialStore.<cstore ID>.config.credential.retry.millis Optional. Number of milliseconds that Data Collector waits before attempting to retry a retrieval of a credential from Azure Key Vault, in the case of an error. credentialStore.<cstore ID>.config.vault.url Required. URL to the key vault created in Azure Key Vault. Use the following format:
https://<key vault name>.vault.azure.net/
credentialStore.<cstore ID>.config.credential.method Required. Authentication method for Azure Key Vault to use. - clientKeys - Use client key authentication.
- managedIdentity - Use managed identity authentication. To use managed mdentity authentication in Data Collector, you must set up a managed identity in Azure. For information on setting up a managed identity in Azure, see the Microsoft documentation.
Default is clientKeys.
credentialStore.<cstore ID>.config.client.id Required to use client key authentication. Application ID assigned to this Data Collector when you registered Data Collector as an application in Azure Active Directory, as described in prerequisites. credentialStore.<cstore ID>.config.client.key Required to use client key authentication. Authentication key assigned to this Data Collector when you registered Data Collector as an application in Azure Active Directory, as described in prerequisites. credentialStore.<cstore ID>.config.enforceEntryGroup Optional. Requires Data Collector to verify if the user who previews, validates, or starts the pipeline belongs to a group that is permitted to access the secret. When set to true, each secret must have a corresponding
<secret key name>-groups
secret key in the same secret that contains a comma-separated list of groups that is permitted to access the secret.For more information, see Group Access to Secrets.
Default is false.
- Restart Data Collector to enable the changes.
Step 3. Call Secrets from the Pipeline
Use the credential:get()
or
credential:getWithOptions()
function in pipeline stage properties to
retrieve keys or secrets from Azure Key Vault.
Use the credential functions in any stage property that displays the key icon next to it. For example:
- cstoreId - Unique ID of the credential store to use. Use the ID specified in the $SDC_CONF/credential-stores.properties file. For more information, see Enabling Credential Stores.
- userGroup - Group that a user must belong to in order to
access the secret. Only users that have execute permission on the pipeline and that
belong to this group can validate, preview, or run the pipeline that retrieves the
secret.
If working with Control Hub, specify the group using the required naming convention:
<group ID>@<organization ID>
.To grant access to all users, specify the defaultall
group when working only with Data Collector. When working with Control Hub and Data Collector version 3.16.0 or later, you can specify the default group usingall
orall@<organization ID>
. StreamSets recommends usingall
so that you do not need to modify credential functions when migrating pipelines from Data Collector to Control Hub.Note: When working with Control Hub and a Data Collector version earlier than 3.16.0, you must use the defaultall@<organization ID>
group. - name - Name of the key or secret to retrieve from Azure Key Vault.
- storeOptions - Used only by the
credential:getWithOptions()
function. Additional options to communicate with the credential store. For Azure Key Vault, you can use the following options:url
- Overrides thecredentialStore.azure.config.vault.url
property in the $SDC_CONF/credential-stores.properties fileretry
- Overrides thecredentialStore.azure.config.credential.retry.millis
property in the $SDC_CONF/credential-stores.properties file.refresh
- Overrides thecredentialStore.azure.config.credential.refresh.millis
property in the $SDC_CONF/credential-stores.properties file.credentialType=certificate
- Instructs Azure Key Vault to retrieve the stored PEM certificate as a certificate rather than as a secret. Use to retrieve a PEM certificate stored in Azure Key Vault when you configure a stage to use a remote keystore or truststore for SSL/TLS encryption.
Use the following format to specify options:"<option1>=<value>,<option2>=<value>"
SQLpassword
secret from the azure
credential
store. The expression allows any user belonging to the devops
group
access to the credential when validating, previewing, or running the
pipeline:${credential:get("azure", "devops@MyCompany", "SQLpassword")}
${credential:getWithOptions("azure", "devops@MyCompany", "SQLpassword", "retry=3000")}