MapR Prerequisites
MapR is now HPE Ezmeral Data Fabric. At times, this documentation uses "MapR" to refer to both MapR and HPE Ezmeral Data Fabric. For information about supported versions, see Supported Systems and Versions.
You can use MapR stages only in a self-managed tarball deployment. You cannot use MapR stages with self-managed Docker deployments or with Control Hub-managed deployments, such as Amazon EC2 deployments. You can install Data Collector engine instances on a node in the MapR cluster or on a client machine.
Due to licensing restrictions, StreamSets cannot distribute MapR libraries with Data Collector. As a result, you must perform additional steps to enable the Data Collector machine to connect to MapR. Data Collector does not display MapR stages in stage library lists until you perform these prerequisites.
- On the Data Collector machine, install MapR client libraries as needed.
- In Control Hub, configure the deployment.
- On each Data Collector machine, run a script to set up MapR.
- If using a secure MapR cluster, configure MapR security.
- Start Data Collector engines.
Supported Versions
- MapR 6.1.x with optional EEP 6.x
Different MapR and HPE Ezmeral Data Fabric versions require different versions of Java. For more information, see Java Versions and Available Features.
To view the complete list of supported EEPs, see the HPE Ezmeral Data Fabric documentation.
Step 1. Install Client Libraries as Needed
You can install Data Collector on a node in the MapR cluster or on a client machine. A client machine is one that is outside the cluster or on your local machine. When you install Data Collector on a client machine, the MapR client package must be installed on the machine.
If you install Data Collector on a node in the MapR cluster, or on a client machine that has the MapR client package installed, you can skip this step.
export
LD_LIBRARY_PATH="${MAPR_HOME}/lib"
. On MacOS, you might use
export JAVA_LIBRARY_PATH="${MAPR_HOME}/lib"
.Step 2. Configure the Deployment
You need to configure a deployment to use MapR. When you configure a deployment, you add MapR stage libraries, remove the stage libraries from a blacklist, add a security policy, and specify Java options as needed. If the MapR cluster is enabled with built-in security, then you need to add a Java option to enable Data Collector to connect to a secure MapR cluster.
- In Control Hub, create or edit a deployment. If creating a new deployment, set the Deployment Type property to Self Managed, then click Save and Next.
- In the Configure Engine step, click Stage
Libraries. Add the MapR stage libraries to include in the
deployment, then click OK to save your changes.When installing MapR stage libraries, you must install both the MapR stage library and the Ecosystem Pack (EEP) stage library for your supported version of MapR. For example, if using HPE Ezmeral Data Fabric version 7.0.x, you must install both of the following stage libraries:
- streamsets-datacollector-mapr_7_0-lib
- streamsets-datacollector-mapr_7_0-mep8-lib
For detailed steps on adding stage libraries to a new self-managed deployment, see "Configure the Engine" in the Control Hub documentation.
For detailed steps on adding stage libraries to an existing deployment, see Updating Stage Libraries in the Control Hub documentation.
- In the Configure Engine step, click Advanced
Configuration, then click Data Collector
Configuration. Find and edit the
system.stagelibs.blacklist
property, then remove the MapR stage libraries that you added to the deployment. - At the top of the Engine Configuration window, click
Security Policy, and then add the following text to
the end of the
section:
//MapR codebase grant codebase "file://<MAPR_HOME>-" { permission java.security.AllPermission; };
where
<MAPR_HOME>
is the MapR home path, typically/opt/mapr
. - At the top of the Engine Configuration window, click
Java Configuration. In the Java
Options property, add the following properties as needed.
- When using MapR 7.0.x or later, add the following
property:
-Dmapr.library.flatclass -Dsecurity.provider=BCFIPS
- When connecting to a MapR cluster with built-in security enabled, add
the following
property:
-Dmaprlogin.password.enabled=true
- When using MapR 7.0.x or later, add the following
property:
- To save all of the engine configuration changes, click Save.
- If you are configuring a new self-managed deployment, in the Configure Install Type step, choose a tarball installation.
- Start or launch the engines:
- If you edited an existing active deployment, click Save and Next until you reach the Review step, then click Restart Engines to restart running engines. If you have engines that are not running, start those engines manually so they receive updates from the deployment.
- If you configured a new deployment, configure the rest of the
deployment. Then, in the Review and Launch step,
click Start & Generate Script. Run the script
on every machine where you want the Data Collector engine to run. Install Data Collector on MapR cluster nodes or client machines.
For more information about launching a self-managed Data Collector tarball, see the Control Hub documentation.
Important: All existing and new engines will fail to start and generate errors about missing classes. This is expected because the prerequisite tasks are not yet complete. Ignore start engine errors and continue with the next prerequisite task.
Step 3. Run the Command to Set Up MapR
After you install all required MapR client libraries and configure a deployment to work with MapR, run the setup-mapr
command on
every Data Collector machine. This command modifies configuration files and creates the required symbolic
links to enable Data Collector to
work with MapR. You can run the command in interactive or non-interactive mode.
In interactive mode, the command prompts you for the MapR version and home directory. In non-interactive mode, you define the MapR version and home directory in environment variables before running the command.
In both modes, the command checks if the MapR distribution of Spark is installed in the specified MapR cluster. If a supported version is installed, the command also installs the MapR Spark stage library for you.
Running the Command in Interactive Mode
When you run the setup-mapr
command in interactive
mode, the command prompts you for the MapR version and home directory.
- Set the following environment variables:
Environment Variable Description SDC_HOME Data Collector home directory. SDC_CONF Data Collector configuration directory. MAPR_MEP_VERSION EEP version. Enter a single digit EEP version number: 4, 5, 6, or 8. Use the following command to set an environment variable:For example, use the following commands:export <environment variable>=<value>
export SDC_HOME=/streamsets-datacollector-6.0.0 export SDC_CONF=/streamsets-datacollector-6.0.0/etc export MAPR_MEP_VERSION=8
- Use the following command from the engine installation
directory to set up MapR:
bin/streamsets setup-mapr
- When prompted, enter the MapR version.
Enter the full three-digit version: 6.0.0, 6.0.1, 6.1.0, or 7.0.0.
- When prompted, enter the absolute path to the MapR home
directory, usually
/opt/mapr
.
Running the Command in Non-Interactive Mode
When you run the setup-mapr
command in
non-interactive mode, you define the MapR version and home directory in environment
variables before running the command.
- Set the following environment variables:
Environment Variable Description SDC_HOME Data Collector home directory. SDC_CONF Data Collector configuration directory. MAPR_HOME MapR home directory, usually /opt/mapr. MAPR_VERSION MapR version. Enter the full three-digit version: 6.0.0, 6.0.1, 6.1.0, or 7.0.0.
MAPR_MEP_VERSION EEP version. Enter a single digit EEP version number: 4, 5, 6, or 8. Use the following command to set an environment variable:
For example, use the following commands:export <environment variable>=<value>
export SDC_HOME=/streamsets-datacollector-6.0.0 export SDC_CONF=/streamsets-datacollector-6.0.0/etc export MAPR_HOME=/opt/mapr export MAPR_VERSION=7.0.0 export MAPR_MEP_VERSION=8
- Use the following command from the engine installation
directory to set up MapR:
bin/streamsets setup-mapr
Step 4. Configure MapR in Secure Clusters
To connect to a secure MapR cluster with built-in security enabled, you must complete additional steps based on your MapR version.
To run MapR commands in a secure cluster, Data Collector must run as the user account granted access in the MapR ticket.
For example, if you ran the following MapR command to generate the service ticket for
applications running outside of the cluster, then Data Collector
must run as the myappuser
user account:
maprlogin generateticket -type service -out /tmp/longlived_ticket -duration 30:0:0 -renewal 90:0:0
-
For MapR 7, complete the following steps:
-
For all MapR versions, ensure that a valid user, tenant, or service ticket
exists for the Data Collector user in MapR. To generate tickets, see the MapR documentation.
Note: If the MapR ticket that Data Collector uses allows impersonation, then you can configure MapR stages in Data Collector to use Hadoop impersonation mode.
Step 5. Start Engine Instances
After you install MapR client libraries as needed, configure a deployment, and run the
setup-mapr
command, you can start all engine instances for the
deployment. After you start engine instances, verify that MapR stages are available in
stage library lists.
- Unsecure MapR cluster
- To start an engine for an unsecure MapR cluster, run the following command
from the engine installation
directory:
bin/streamsets dc
- Secure MapR cluster
-
You can start an engine for a secure MapR cluster in either of the following ways:
- Log into the command prompt as the user account granted access in the MapR
ticket, then use the following command from the engine
installation directory:
bin/streamsets dc
- Impersonate the required user account by using the following launch command from
the engine
installation directory, where
<user>
is the user account granted access in the MapR ticket:sudo -u <user> bin/streamsets dc
For example:sudo -u myappuser /opt/streamsets-datacollector-6.0.0/bin/streamsets dc
- Log into the command prompt as the user account granted access in the MapR
ticket, then use the following command from the engine
installation directory: