MapR Prerequisites
MapR is now HPE Ezmeral Data Fabric. At times, this documentation uses "MapR" to refer to both MapR and HPE Ezmeral Data Fabric. For information about supported versions, see Supported Systems and Versions.
Due to licensing restrictions, StreamSets cannot distribute MapR libraries with Data Collector. As a result, you must perform additional steps to enable the Data Collector machine to connect to MapR. Data Collector does not display MapR stages in stage library lists nor the MapR Streams statistics aggregator in the pipeline properties until you perform these prerequisites. Install Data Collector on a node in the MapR cluster or on a client machine.
The MapR prerequisites include installing all necessary client libraries on the Data Collector machine. If you use a core installation of Data Collector, you must install the MapR stage libraries. If you use a common installation of Data Collector, you install MapR stage libraries when Data Collector does not include the version that you want to use. Then, you run the command to set up MapR.
If the MapR cluster is enabled with built-in security, you also must configure Data Collector to connect to a secure MapR cluster and ensure that a valid ticket exists for the Data Collector user.
Supported Versions
- MapR 6.0.0 with optional EEP 4.x
- MapR 6.0.1 with optional EEP 5.x
- MapR 6.1.x with optional EEP 6.x
Different MapR and HPE Ezmeral Data Fabric versions require different versions of Java. For more information, see Java Versions and Available Features.
To view the complete list of supported EEPs, see the HPE Ezmeral Data Fabric documentation.
Step 1. Install Client Libraries as Needed
You can install Data Collector on a node in the MapR cluster or on a client machine. A client machine is one that is outside the cluster or on your local machine. When you install Data Collector on a client machine, the MapR client package must be installed on the machine.
If you install Data Collector on a node in the MapR cluster, or on a client machine that has the MapR client package installed, you can skip this step.
- MapR client library - Typically named
mapr-client_<version>.<ext>
.You can download the files for your operating system here:http://archive.mapr.com/releases/
- Kafka client library - Typically named
mapr-kafka-<version>.<ext>
.You can download the files for your operating system here:http://package.mapr.com/releases/MEP/
export
LD_LIBRARY_PATH="${MAPR_HOME}/lib"
. On MacOS, you might use
export JAVA_LIBRARY_PATH="${MAPR_HOME}/lib"
.Step 2. Install MapR Stage Libraries
Install MapR stage libraries if you use a core installation of Data Collector, which does not include MapR stage libraries.
If you use a common installation of Data Collector and you need a different MapR version that is not included with the installation, you must also install MapR stage libraries. If you use a full installation of Data Collector or if you use Data Collector on a cloud service provider, your Data Collector installation includes all stage libraries, so you can skip this step.
- streamsets-datacollector-mapr_7_0-lib
- streamsets-datacollector-mapr_7_0-mep8-lib
For more information about installing additional stage libraries, see Install Additional Stage Libraries.
Step 3. Run the Command to Set Up MapR
After you install all required MapR client libraries and
install MapR stage libraries, run the setup-mapr
command on
every Data Collector machine. This command modifies configuration files and creates the required symbolic
links to enable Data Collector to
work with MapR. You can run the command in interactive or non-interactive mode.
In interactive mode, the command prompts you for the MapR version and home directory. In non-interactive mode, you define the MapR version and home directory in environment variables before running the command.
In both modes, the command checks if the MapR distribution of Spark is installed in the specified MapR cluster. If a supported version is installed, the command also installs the MapR Spark stage library for you.
Running the Command in Interactive Mode
When you run the setup-mapr
command in interactive
mode, the command prompts you for the MapR version and home directory.
- Set the following environment variables:
Environment Variable Description SDC_HOME Data Collector home directory. Note: The default home directory for an RPM installation is/opt/streamsets-datacollector
. The tarball home directory is the location where you extracted the file.SDC_CONF Data Collector configuration directory. MAPR_MEP_VERSION EEP version. Enter a single digit EEP version number: 4, 5, 6, or 8. Use the following command to set an environment variable:For example, use the following commands if you used the default home and configuration directories for an RPM installation, and use EEP 8:export <environment variable>=<value>
export SDC_HOME=/opt/streamsets-datacollector export SDC_CONF=/etc/sdc export MAPR_MEP_VERSION=8
- Use the following command from the
$SDC_HOME
directory to set up MapR:bin/streamsets setup-mapr
- When prompted, enter the MapR version.
Enter the full three-digit version: 6.0.0, 6.0.1, 6.1.0, or 7.0.0.
- When prompted, enter the absolute path to the MapR home
directory, usually
/opt/mapr
. - Restart Data Collector and verify that MapR stages appear in stage library lists.
Running the Command in Non-Interactive Mode
When you run the setup-mapr
command in
non-interactive mode, you define the MapR version and home directory in environment
variables before running the command.
- Set the following environment variables:
Environment Variable Description SDC_HOME Data Collector home directory. Note: The default home directory for an RPM installation is/opt/streamsets-datacollector
. The tarball home directory is the location where you extracted the file.SDC_CONF Data Collector configuration directory. MAPR_HOME MapR home directory, usually /opt/mapr. MAPR_VERSION MapR version. Enter the full three-digit version: 6.0.0, 6.0.1, 6.1.0, or 7.0.0.
MAPR_MEP_VERSION EEP version. Enter a single digit EEP version number: 4, 5, 6, or 8. Use the following command to set an environment variable:
For example, use the following commands if you used the default home and configuration directories for an RPM installation, the default MapR home directory, MapR 7.0.0, and EEP 8:export <environment variable>=<value>
export SDC_HOME=/opt/streamsets-datacollector export SDC_CONF=/etc/sdc export MAPR_HOME=/opt/mapr export MAPR_VERSION=7.0.0 export MAPR_MEP_VERSION=8
- Use the following command from the
$SDC_HOME
directory to set up MapR:bin/streamsets setup-mapr
- Restart Data Collector and verify that MapR stages appear in stage library lists.
Step 4. Configure Data Collector for Secure MapR Clusters
When a MapR cluster has built-in security enabled, you must configure Data Collector to connect to a secure MapR cluster.
Modify the SDC_JAVA_OPTS
environment variable to add the
-Dmaprlogin.password.enabled
configuration property.
Modify the environment variable in the required file based on how you start Data Collector. For more information about the required file to edit, see Modifying Environment Variables.
- Manual start - Uncomment the following line in the
sdc-env.sh
file:#export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Dmaprlogin.password.enabled=true"
- Service start on operating systems that use the SysV init system - On
CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu
14.04 LTS, uncomment the following line in the
sdcd-env.sh
file:#export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Dmaprlogin.password.enabled=true"
- Service start on operating systems that use the systemd init system - On
CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu
16.04 LTS, add the following line to the file that overrides the default
settings in the
sdc.service
file:Environment=SDC_JAVA_OPTS=-Dmaprlogin.password.enabled=true
Override the default values in the
sdc.service
file using the same procedure that you use to override unit configuration files on a systemd init system. For an example, see "Example 2. Overriding vendor settings" in this systemd.unit manpage.After overriding the default values, use the following command to reload the systemd manager configuration:
systemctl daemon-reload
After modifying the environment variables, restart Data Collector to enable the changes.
Step 5. Run Data Collector as a MapR Ticket User
To connect to a secure MapR cluster with built-in security enabled, ensure that a valid user, tenant, or service ticket exists for the Data Collector user in MapR. To generate tickets, see the MapR documentation.
To run MapR commands in a secure cluster, Data Collector must run as the user account granted access in the MapR ticket.
For example, if you ran the following MapR command to generate the service ticket for
applications running outside of the cluster, then Data Collector
must run as the myappuser
user account:
maprlogin generateticket -type service -out /tmp/longlived_ticket -duration 30:0:0 -renewal 90:0:0
Configure Data Collector to run as the required user account based on how you start Data Collector:
- Manual start
- When Data Collector starts manually, it runs as the system user account logged into the
command prompt when you use the following launch command from the
$SDC_DIST
directory:bin/streamsets dc
You can start an engine for a secure MapR cluster in either of the following ways:- Log into the command prompt as the user account granted access in the MapR
ticket, then use the following command from the
$SDC_DIST
directory:bin/streamsets dc
- Impersonate the required user account by using the following launch command from
the
$SDC_DIST
directory, where<user>
is the user account granted access in the MapR ticket:sudo -u <user> bin/streamsets dc
For example:sudo -u myappuser /opt/streamsets-datacollector-5.11.0/bin/streamsets dc
- Log into the command prompt as the user account granted access in the MapR
ticket, then use the following command from the
- Service start
- When Data Collector starts as a service, it runs as the system user account and group defined
in environment variables. The default system user and group are named
sdc
. To use the defaultsdc
system user and group, generate a new MapR user or service ticket for thesdc
user account, as described in the MapR documentation.