User Authentication

Data Collector can authenticate user accounts in several ways.

If you have an enterprise account, you typically use Control Hub authentication to access Data Collector.

If you are not using Control Hub to access Data Collector, you can configure Data Collector to use LDAP authentication or file-based authentication. Best practice is to use LDAP authentication, particularly for a production deployment of Data Collector. By default, Data Collector uses file-based authentication.

Data Collector provides several roles that determine the actions that users can perform. The steps you use to assign roles to user accounts vary, based on whether you are using LDAP or file-based authentication.

Configuring LDAP Authentication

If your organization uses LDAP and you want multiple users to access Data Collector, you can configure Data Collector to use LDAP authentication. After you configure LDAP authentication, users log in to Data Collector using their LDAP username and password.

To configure LDAP authentication, perform the following tasks:
  1. Configure LDAP connection information.
  2. Optionally, configure secure connections to the LDAP server.
  3. Map LDAP groups to Data Collector roles.
  4. Optionally, configure multiple LDAP servers.
  5. If you use MapR stages, enable LDAP authentication for MapR.

Step 1. Configure LDAP Connection Information

To enable LDAP authentication, configure LDAP connection information in the Data Collector configuration files, sdc.properties and ldap-login.conf, located in the $SDC_CONF directory.

  1. In the Data Collector configuration file, $SDC_CONF/sdc.properties, enable LDAP authentication by setting the http.authentication.login.module property to ldap.
  2. In the $SDC_CONF/sdc.properties file, define the HTTP authentication type by setting the http.authentication property to basic, digest, or form.

    The HTTP authentication type determines how passwords are transferred from the browser to Data Collector over HTTP. Digest authentication encrypts user credentials. Basic and form authentication transfer user credentials as is. To ensure that user credentials are safely transmitted for basic and form authentication, configure Data Collector to use HTTPS and configure secure connections to LDAP using LDAPS or StartTLS.

    For a Microsoft Active Directory server, use basic or form authentication.

  3. In the $SDC_CONF/ldap-login.conf file, configure the connection information for the LDAP server.
    In the file, configure the following properties:
    LDAP Property Description
    debug Enables LDAP debugging. Default is false.
    useLdaps Secure LDAP connections using the LDAPS (LDAP over SSL) protocol. Default is false.

    You must complete additional steps to use LDAPS, see Step 2. Configure Secure Connections to LDAP (Optional).

    useStartTLS Secure LDAP connections using the StartTLS protocol. Default is false.
    You must complete additional steps to use StartTLS, see Step 2. Configure Secure Connections to LDAP (Optional).
    Note: StartTLS and LDAPS cannot be used at the same time. If both useStartTLS and useLdaps are set to true, useStartTLS takes precedence.
    contextFactory LDAP initial context factory. Default is com.sun.jndi.ldap.LdapCtxFactory.
    hostname LDAP server host name.
    port LDAP server port.

    To use unencrypted connections or to use connections secured with StartTLS, enter the LDAP port number, typically 389. To use connections secured with LDAPS, enter the port number for secure connections, typically 636.

    bindDn Root distinguished name (DN) used to query the directory server. This user must have privileges to search the directory.
    bindPassword Password for the root distinguished name. For additional security, save the password in the $SDC_CONF/ldap-bind-password.txt file without additional characters, spaces, or line breaks. As a best practice, the file should have owner-only permissions.

    Default is @ldap-bind-password.txt@, which points to the ldap-bind-password.txt file for the password.

    forceBindingLogin Determines if binding login checks are performed. Use one of the following options:
    • false - Data Collector performs the authentication based on information received from the LDAP server.

      When set to false, the bindDn user must have permission to access the details, password, and group information for users. Use for digest authentication.

    • true - Data Collector passes the user credentials to the LDAP server for authentication.

      When set to true, the bindDn user must have permission to access the group information for users. Use for a Microsoft Active Directory server, for basic or form authentication, or when the password stored in the LDAP server is encrypted. Do not use for digest authentication.

    Default is false.

    userBaseDn Base DN under which user accounts are located.
    userIdAttribute Name of the user ID attribute. Default is uid.
    userPasswordAttribute Name of the attribute where the user password is stored. Default is userPassword.

    For a Microsoft Active Directory server, use an empty string.

    userObjectClass Name of the user object class. Default is inetOrgPerson.
    userFilter Name of the user attribute used to log in to Data Collector. For example, LDAP users can log in using a username, uid, or email address. Default is uid={user}.
    roleBaseDn Base DN to search for role membership.
    roleNameAttribute Name of the attribute for role names. Default is roleName.
    roleMemberAttribute Name of the role attribute for user names. Default is Member.
    roleObjectClass Role object class. Default is groupOfNames.
    roleFilter Specifies whether LDAP group members are determined by DN or uid. Enter one of the following values:
    • "member={dn}" - Use when LDAP groups list members by full DN.
    • "memberUid={user}" - Use when LDAP groups list members by uid.

    Default is DN.

Example for OpenLDAP

Let's look at an example ldap-login.conf file and see how Data Collector uses the LDAP connection information to authenticate LDAP users.

The following example shows a ldap-login.conf file for an OpenLDAP server:

ldap {
  com.streamsets.datacollector.http.LdapLoginModule required
  debug="true"
  useLdaps="true"
  useStartTLS="false"
  contextFactory="com.sun.jndi.ldap.LdapCtxFactory"
  hostname="server1"
  port="636"
  bindDn="cn=admin,dc=example,dc=net"
  bindPassword="@ldap-bind-password.txt@"
  forceBindingLogin="false"
  userBaseDn="ou=users,dc=example,dc=net"
  userIdAttribute="uid"
  userPasswordAttribute="userPassword"
  userObjectClass="inetOrgPerson"
  userFilter="uid={user}"
  roleBaseDn="ou=groups,dc=example,dc=net"
  roleNameAttribute="cn"
  roleMemberAttribute="member"
  roleObjectClass="groupOfNames"
  roleFilter="member={dn}";
}; 

When an LDAP user logs into Data Collector, Data Collector uses the connection information in the ldap-login.conf file to authenticate the user. Data Collector completes the following steps to authenticate the LDAP user:

  1. When forceBindingLogin is set to false, checks if the user account is registered in the configured LDAP server by sending the following query to the LDAP server:
    ldapsearch -LLL -H ldaps://<hostname>:<port> -x -D <bindDn> -w <bindPassword> -b <userBaseDn> "(&(objectClass=<userObjectClass>)(<userIdAttribute>=username))"

    For example, let's use the sample ldap-login.conf file configured above, assume that the password defined in ldap-bind-password.txt is "password", and assume that a user logs in to Data Collector with the username of jdoe. Data Collector sends the following query to the LDAP server:

    ldapsearch -LLL -H ldaps://server1:636 -x -D "cn=admin,dc=example,dc=net" -w password -b "ou=users,dc=example,dc=net" "(&(objectClass=inetOrgPerson)(uid=jdoe))"

    If the user account doesn't exist, Data Collector fails the authentication. If the user account exists, Data Collector continues with the next authentication step.

    Note: When forceBindingLogin is set to true, Data Collector does not send this query to the LDAP server. Instead, Data Collector passes the user credentials to the LDAP server for authentication. If the LDAP server successfully authenticates the user account, Data Collector continues with the next authentication step.
  2. Checks which LDAP group the user account belongs to by sending the following query to the LDAP server:
    ldapsearch -LLL -H ldaps://<hostname>:<port> -x -D <bindDn> -w <bindPassword> -b <roleBaseDn> "(&(objectClass=<roleObjectClass>)(member={dn}))"

    For example, using the sample ldap-login.conf file configured above, Data Collector sends the following query to the LDAP server:

    ldapsearch -LLL -H ldaps://server1:636 -x -D "cn=admin,dc=example,dc=net" -w password -b "ou=groups,dc=example,dc=net" "(&(objectClass=groupOfNames)(member="cn=jdoe,ou=users,dc=example,dc=net"))"

    The LDAP server returns the names of the LDAP groups that the user belongs to. Data Collector uses the group names to determine the Data Collector roles mapped to the LDAP groups, as explained in Step 3. Map LDAP Groups to Data Collector Roles.

Example for Active Directory

The following example shows a ldap-login.conf file for a Microsoft Active Directory server. Data Collector completes the same steps to authenticate LDAP users in Active Directory as in OpenLDAP.

ldap {
     com.streamsets.datacollector.http.LdapLoginModule required
     debug="true"
     useLdaps="true"
     useStartTLS="false"
     contextFactory="com.sun.jndi.ldap.LdapCtxFactory"
     hostname="*******"
     port="636"
     bindDn="********"
     bindPassword="@ldap-bind-password.txt@"
     forceBindingLogin="true"
     userBaseDn="ou=Department,dc=Company,dc=net"
     userIdAttribute="sAMAccountName"
     userPasswordAttribute=""
     userObjectClass="person"
     userFilter="sAMAccountName={user}"
     roleBaseDn="ou=Department,dc=Company,dc=net"
     roleNameAttribute="cn"
     roleMemberAttribute="member"
     roleObjectClass="group"
     roleFilter="member={dn}";
};

Step 2. Configure Secure Connections to LDAP (Optional)

You can optionally configure Data Collector to use one of the following methods to make secure connections to the LDAP server:
LDAP over SSL (LDAPS)
LDAPS uses SSL to encrypt LDAP connections. LDAPS uses the ldaps:// scheme.
StartTLS
StartTLS can wrap an unencrypted connection with TLS during the connection process. This allows the same port to handle both unencrypted and encrypted connections. StartTLS uses the ldap:// scheme.

For either encryption method, if the LDAP server certificate is signed by a private Certificate Authority (CA) or not trusted by the default Java truststore, you must create a custom truststore file or modify a copy of the default Java truststore file to add the CA to the file. Then configure Data Collector to use the modified truststore file.

Use the same procedure to configure either secure method.

  1. In the $SDC_CONF/ldap-login.conf file, set either the useLdaps or useStartTLS property to true.
    By default, both properties are false and so Data Collector makes unencrypted connections to the LDAP server. If you set both properties to true, useStartTLS takes precedence.
  2. Set the port property in the ldap-login.conf file as required, based on the method that you enabled:
    • useLdaps - Use the port number for secure connections, typically 636.
    • useStartTLS - Use the LDAP port number, typically 389.
  3. If the LDAP server certificate is signed by a private CA or not trusted by the default Java truststore, create a custom truststore file or modify a copy of the default Java truststore file to add the CA to the file. Then configure Data Collector to use the modified truststore file.

    By default, Data Collector uses the Java truststore file located in $JAVA_HOME/jre/lib/security/cacerts. If your certificate is signed by a CA that is included in the default Java truststore file, you do not need to create a truststore file and can skip this step.

    In these steps, we show how to modify the default truststore file to add an additional CA to the list of trusted CAs. If you prefer to create a custom truststore file, see the keytool documentation.
    Note: If you've already configured Data Collector to use a custom truststore file to enable HTTPS, then simply add this additional CA to the same modified truststore file that you created in Step 2. Create a Truststore File.
    1. Use the following command to set the JAVA_HOME environment variable:
      export JAVA_HOME=<Java home directory>
    2. Use the following command to set the SDC_CONF environment variable:
      export SDC_CONF=<Data Collector configuration directory>
      For example, for an RPM installation use:
      export SDC_CONF=/etc/sdc
    3. Use the following command to copy the default Java truststore file to the Data Collector configuration directory:
      cp "${JAVA_HOME}/jre/lib/security/cacerts" "${SDC_CONF}/truststore.jks"
    4. Use the following keytool command to import the CA certificate into the truststore file:
      keytool -import -file <LDAP certificate> -trustcacerts -noprompt -alias <LDAP alias> -storepass <password> -keystore "${SDC_CONF}/truststore.jks"
      For example:
      keytool -import -file  myLDAPServer.pem -trustcacerts -noprompt -alias MyLDAPServer -storepass changeit -keystore "${SDC_CONF}/truststore.jks"
    5. Define the following options in the SDC_JAVA_OPTS environment variable:
      • javax.net.ssl.trustStore - Path to the truststore file on the Data Collector machine.
      • javax.net.ssl.trustStorePassword - Truststore password.

      Modify environment variables using the method required by your installation type.

      For example, define the options as follows:
      export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Djavax.net.ssl.trustStore=/etc/sdc/truststore.jks -Djavax.net.ssl.trustStorePassword=mypassword -Xmx1024m -Xms1024m -server -XX:-OmitStackTraceInFastThrow"

      Or to avoid saving the password in the export command, save the password in a text file and then define the truststore password option as follows: -Djavax.net.ssl.trustStorePassword=$(cat passwordfile.txt)

      Then ensure that the password file is readable only by the user executing the export command.

  4. Restart Data Collector to enable the changes.

Step 3. Map LDAP Groups to Data Collector Roles

Data Collector roles determine the tasks that a user can perform. You map LDAP groups to Data Collector roles. An authenticated user account that belongs to that LDAP group can complete the tasks determined by the mapped role.

After you map LDAP groups to Data Collector roles, you can assign pipeline permissions to the groups. Pipeline permissions determine the pipeline access that each user has. For example, say you have an LDAP Developer group for all pipeline developers. When you configure the Data Collector LDAP properties, you assign the Creator role to the Developer group so they can create new pipelines. To allow the group to edit existing pipelines, you configure the permissions for each pipeline and assign read and write permission to the Developer group. For more information, see Roles and Permissions.

To map LDAP groups to Data Collector roles, in the Data Collector configuration file, $SDC_CONF/sdc.properties, configure the http.authentication.ldap.role.mapping property.

Data Collector provides the following roles:

Role Description
admin Perform any Data Collector task. Can perform all tasks listed below, as well as activate Data Collector, restart and shut down Data Collector, and view Data Collector metrics. Enable Control Hub. Install libraries using Package Manager. Generate support bundles.
manager Start and stop pipelines, monitor pipelines, configure and reset alerts. Take, review, and manage snapshots.
creator Create and configure pipelines and alerts, preview data, and monitor the pipeline. Import pipelines.
guest View pipelines and alerts, and general monitoring information. Export a pipeline.
You can map multiple roles to the same group or vice versa. Use a semicolon to separate LDAP groups and commas to separate Data Collector roles, as follows:
<ldap group>:<SDC role>,<additional SDC role>,<additional SDC role>);<ldap group>:<SDC role>,<additional SDC role>... 

When you have finished mapping LDAP groups to roles, restart Data Collector to enable the changes to the configuration file.

The following example maps the DEV LDAP group to the creator role, the OPS LDAP group to the manager role, and the SUPER LDAP group to both creator and manager:
DEV:creator;OPS:manager;SUPER:creator,manager

Step 4. Configure Multiple LDAP Servers (Optional)

If your organization has multiple LDAP servers, you can configure Data Collector to connect to each of the servers.

The steps that you complete to configure multiple LDAP servers depend on the following installation types:

Installation from the tarball, RPM package, or Docker Hub
To configure multiple LDAP servers, simply configure the connection information for the additional LDAP servers in the $SDC_CONF/ldap-login.conf. Then, restart Data Collector to enable the changes.
Cloudera Manager installation
For a Cloudera Manager installation, configure the connection information for the additional LDAP servers in an advanced configuration snippet - or safety valve - within Cloudera Manager.
In Cloudera Manager, select the StreamSets service, then click Configuration. Select Use Safety Valve to Edit LDAP Information, and then configure the connection information for all of the LDAP servers in the Data Collector Advanced Configuration Snippet for ldap.login.conf safety valve.
When you configure multiple LDAP servers in the safety valve, Cloudera Manager ignores all values entered for the ldap.* properties in the Configuration tab.

Use the following guidelines when configuring multiple LDAP servers:

  • The LDAP user account used to log in to Data Collector must be registered in at least one of the configured LDAP servers to be authenticated.
  • If the LDAP user accounts belong to different LDAP groups in each LDAP server, include all of the group names when you map LDAP groups to Data Collector roles in the $SDC_CONF/sdc.properties file.
  • If the additional LDAP servers use different passwords for bindDN - the root distinguished name (DN) for the connection - then directly define the passwords in the bindPassword property.
Note: When you configure multiple LDAP servers, Data Collector attempts to connect to each server in the order listed in the ldap-login.conf file. If Data Collector successfully authenticates the user account in one of the LDAP servers, Data Collector still continues to authenticate with the remaining LDAP servers. This can cause the Data Collector log file to include login failure error messages. You can ignore these error messages.

The following example shows an ldap-login.conf file configured to connect to two OpenLDAP servers, server1 and server2. Each server uses the same password for bindDN:

ldap {
  com.streamsets.datacollector.http.LdapLoginModule required
  debug="true"
  useLdaps="false"
  useStartTLS="false"
  contextFactory="com.sun.jndi.ldap.LdapCtxFactory"
  hostname="server1" 
  port="389"                 
  bindDn="*******"
  bindPassword="@ldap-bind-password.txt@"  
  forceBindingLogin="true"
  userBaseDn="ou=People,dc=example,dc=org"
  userIdAttribute="uid"
  userPasswordAttribute="userPassword"
  userObjectClass="inetOrgPerson"
  userFilter="uid={user}"
  roleBaseDn="ou=Groups,dc=example,dc=org"
  roleNameAttribute="cn"
  roleMemberAttribute="member"
  roleObjectClass="groupOfNames"
  roleFilter="member={dn}";


  com.streamsets.datacollector.http.LdapLoginModule required
  debug="true"
  useLdaps="false"
  useStartTLS="false"
  contextFactory="com.sun.jndi.ldap.LdapCtxFactory"
  hostname="server2" 
  port="389"                
  bindDn="*******"
  bindPassword="@ldap-bind-password.txt@"  
  forceBindingLogin="true"
  userBaseDn="ou=People,dc=example,dc=org"
  userIdAttribute="uid"
  userPasswordAttribute="userPassword"
  userObjectClass="inetOrgPerson"
  userFilter="uid={user}"
  roleBaseDn="ou=Groups,dc=example,dc=org"
  roleNameAttribute="cn"
  roleMemberAttribute="member"
  roleObjectClass="groupOfNames"
  roleFilter="member={dn}";
};

Step 5. Enable LDAP Authentication for MapR Stages

To use MapR stages with a Data Collector configured to use LDAP authentication, you must perform an additional step after configuring LDAP authentication.

The MapR distribution for Hadoop uses the Java Authentication and Authorization Service (JAAS) to control security features. The $MAPR_HOME/conf/mapr.login.conf file specifies configuration parameters for JAAS.

Data Collector expects LDAP configuration to be in the JAAS configuration file $SDC_CONF/ldap-login.conf and overrides the java.security.auth.login.config system property to point to this file.

To avoid this conflict, copy the contents of the $MAPR_HOME/conf/mapr.login.conf file into the $SDC_CONF/ldap-login.conf file after you configure the LDAP connection information in the ldap-login.conf file.

For example, use the following command to append the contents of the mapr.login.conf file to the end of the ldap-login.conf file:
cat $MAPR_HOME/conf/mapr.login.conf >> $SDC_CONF/ldap-login.conf

Configuring File-Based Authentication

If your organization does not use LDAP and you want to enable multiple users to access Data Collector, you might configure Data Collector to use file-based authentication.

To configure file-based authentication, perform the following tasks:

  1. Configure authentication properties.
  2. Configure Data Collector users, groups, and roles.

Users can change their password after logging into Data Collector.

Step 1. Configure Authentication Properties

Configure authentication properties in the Data Collector configuration file, $SDC_CONF/sdc.properties.

When you use file-based authentication, you can use the Basic, Digest, or Form authentication type.

  1. In the Data Collector configuration file, $SDC_CONF/sdc.properties, enable file-based authentication by setting the http.authentication.login.module property to file.
  2. In the $SDC_CONF/sdc.properties file, define the HTTP authentication type by setting the http.authentication property to basic, digest, or form.
  3. Specify whether Data Collector checks the permissions for the associated realm.properties file for the type of authentication that you use. Set the http.realm.file.permission.check property to one of the following values:
    • true to ensure that the realm.properties file allows access only to the owner.
    • false to skip the permission check.
    You'll use the realm.properties file in the next step, when you configure Data Collector users and roles.

Step 2. Configure Users, Groups, and Roles

For file-based authentication, you configure the users that can log in to Data Collector. You assign roles to each user account and you can optionally create and assign groups to the user accounts.

Configure users, groups, and roles in the properties file for the type of authentication that you use: $SDC_CONF/<authentication>-realm.properties.

Data Collector roles determine the tasks that a user can perform. You can also create groups and assign the groups to related user accounts. Use groups to easily assign pipeline permissions to groups of users. Pipeline permissions determine the pipeline access that each user has.

For example, say you use file-based authentication and want to create an Ops group to manage pipelines. To handle this, when you configure users in the authentication properties file, you grant the Manager role and add the Ops group for each operations user. Then, you edit each pipeline they need to manage, assigning read and execute permission to the Ops group. For more information, see Roles and Permissions.

Data Collector provides several default user accounts and groups. You can change or remove these default user accounts and groups. For increased security, change the passwords for the default user accounts.
Note: Data Collector installed through a cloud service provider marketplace includes only the default admin user account and no default groups.
For file-based authentication, Data Collector provides the following default user accounts with corresponding roles:
User Login Role Tasks
admin / admin Admin Perform any Data Collector task. Can perform all tasks listed below, as well as activate Data Collector, restart and shut down Data Collector, and view Data Collector metrics. Enable Control Hub. Install libraries using Package Manager. Generate support bundles.
manager / manager Manager Start and stop pipelines, monitor pipelines, configure and reset alerts. Take, review, and manage snapshots.
creator / creator Creator Create and configure pipelines and alerts, preview data, and monitor the pipeline. Import pipelines.
guest / guest Guest View pipelines and alerts, and general monitoring information. Export a pipeline.

For file-based authentication, Data Collector also provides a default all group that includes every user, and a dev and test group. The following default user accounts are available for the dev and test user groups:

User Login Roles Group
user1 / user1 Manager and Creator dev
user2 / user2 Manager and Creator dev
user3 / user3 Manager and Creator test
user4 / user4 Manager and Creator test

Configure users and groups in the properties file for the type of authentication that you use. For example, if you use basic authentication, use the basic-realm.properties file.

To hash login information, you can use an md5 program such as md5 on Mac OS X or md5sum on Linux. For example, you might use the following command to hash a password so that the password is not displayed in the prompt:
read -s pw && echo -n "$pw" | md5

For basic and form authentication, hash the password alone. For example, when the above command prompts you for the password, enter only the password.

For digest authentication, hash the combination of <user name>:<realm>:<password>, where <realm> is the authentication type. For example, when the above command prompts you for the password, enter:

<user name>:<realm>:<password>

as follows:

jdoe:digest-realm:JdoePass

  1. To configure users and groups, modify the properties file for the type of authentication that you use.
    The file name is $SDC_CONF/<authentication>-realm.properties.
  2. For each new user, add a user definition using the following format:
    <user name>: MD5:<md5-text>, user, <role> [, <additional role>, <additional role>...] [, group:<group>, group:<additional group>...]
    Note: Assign one or more roles to each user. Be sure to include user in every user definition.
    For example, the following line defines a user named jsmith assigned the Creator role and to the Development group:
    jsmith: MD5:6d0258c2440a7d19e916292b231e3190,user,creator,group:Development
  3. To make the new users available, restart Data Collector.

Changing Your Password

When Data Collector is configured for file-based authentication, you can use the Data Collector UI to change your password.

Note: When Data Collector is installed through Cloudera Manager, you cannot use the Data Collector UI to change your password. Data Collector configuration properties, including file-based authentication, are administered through Cloudera Manager.
  1. Click the User icon (), and then click Change Password.
  2. Enter your current and new password, verify the new password, and then click Save.

    Your changed password takes effect the next time you log in to Data Collector.