Administration

Providing an Activation Code

Users with an enterprise account need to provide an activation code only when using a Docker Data Collector image that is not registered with Control Hub.

Other users do not need to provide an activation code.

To request an activation code, submit a request through the StreamSets Support portal.

After you receive an email with the activation code, log in to Data Collector. On the registration page shown below, click Enter a Code, then paste the code into the Activation window and click Activate.

Users with the Admin role can view activation details by clicking Administration > Activation.

Viewing Data Collector Configuration Properties

To view Data Collector configuration properties, click Administration > Configuration.

For details about the configuration properties or to edit the configuration file, see Configuring Data Collector.

Viewing Data Collector Directories

You can view the directories that the Data Collector uses. You might check the directories being used to access a file in the directory or to increase the amount of available space for a directory.

Data Collector directories are defined in environment variables. For more information, see Data Collector Environment Configuration.

To view Data Collector directories, click Administration > SDC Directories.

The following table describes the Data Collector directories that display:
Directory Includes Environment Variable
Runtime Base directory for Data Collector executables and related files. SDC_DIST
Configuration The Data Collector configuration file, sdc.properties, and related realm properties files and keystore files.

Also includes the logj4 properties file.

SDC_CONF
Data Pipeline configuration and run details. SDC_DATA
Log Data Collector log file, sdc.log. SDC_LOG
Resources Directory for runtime resource files. SDC_RESOURCES
SDC Libraries Extra Directory Directory to store external libraries. STREAMSETS_LIBRARIES_EXTRA_DIR

Viewing Data Collector Metrics

You can view metrics about Data Collector, such as the CPU usage or the number of pipeline runners in the thread pool.

  1. To view Data Collector metrics, click Administration > SDC Metrics.
    The Data Collector Metrics page displays all metrics by default.
  2. To modify the metrics that display on the page, click the More icon, and then click Settings.
  3. Remove any metric charts that you don't want to display, and then click Save.

Viewing Data Collector Logs

You can view and download log data. When you download log data, you can select the file to download.

  1. To view log data for the Data Collector, click Administration > Logs.
    The Data Collector UI displays roughly 50,000 characters of the most recent log information.
  2. To stop the automatic refresh of log data, click Stop Auto Refresh.
    Or, click Start Auto Refresh to view the latest data.
  3. To view earlier events, click Load Previous Logs.
  4. To download the latest log file, click Download. To download a specific log file, click Download > <file name>.
    The most recent information is in the file with the highest number.

Data Collector Log Format

Data Collector uses the Apache Log4j library to write log data. Each log entry includes a timestamp and message along with additional information relevant for the message.

In the Data Collector UI, each log entry has the following information:
  • Timestamp
  • Pipeline
  • Severity
  • Message
  • Category
  • User
  • Runner
  • Thread

For example:

In the downloaded log file, the log entry has the same information, presented in a different order, plus the stage that encountered the message. The downloaded file shows the information in the following order:
  • Timestamp
  • User
  • Pipeline
  • Runner
  • Thread
  • Stage
  • Severity
  • Category
  • Message
For example:
2019-03-19 09:34:26,236 [user:admin] [pipeline:Test/TestPipeline65f67dde-faad-426d-ac47-8a2cd707f224] [runner:] [thread:webserver-430] [stage:] INFO  StandaloneAndClusterRunnerProviderImpl - Pipeline execution mode is: STANDALONE 

For this message, the stage and runner are not relevant, and therefore not included in the log entry.

The information included in the downloaded file is set by the appender.streamsets.layout.pattern property in the log configuration file, $SDC_CONF/sdc-log4j2.properties. The default configuration sets this property to:

%d{ISO8601} [user:%X{s-user}] [pipeline:%X{s-entity}] [runner:%X{s-runner}] [thread:%t] [stage:%X{s-stage}] %-5p %c{1} - %m%n
To customize the log format, see the Log4j documentation. Data Collector provides the following custom objects:
  • %X{s-entity} - Pipeline name and ID
  • %X{s-runner} - Runner ID
  • %X{s-stage} - Stage name
  • %X{s-user} - User who initiated the operation

Modifying the Log Level

If the Data Collector logs do not provide enough troubleshooting information, you can modify the log level to display messages at another severity level.

By default, Data Collector logs messages at the INFO severity level. You can configure the following log levels:
  • TRACE
  • DEBUG
  • INFO (Default)
  • WARN
  • ERROR
  • FATAL
  1. Click Administration > Logs.
  2. Click Log Config.
    Data Collector displays the contents of the log configuration file, $SDC_CONF/sdc-log4j2.properties.
  3. Change the default value of INFO for the following line in the file:
    logger.l1.level=INFO

    For example, to set the log level to DEBUG, modify the line as follows:

    logger.l1.level=DEBUG
  4. Click Save.
    The changes that you make to the log level take effect immediately - you do not need to restart Data Collector. You can also change the log file by directly editing the log configuration file, $SDC_CONF/sdc-log4j2.properties.
    Note: For a Cloudera Manager installation, use Cloudera Manager to modify the log level. In Cloudera Manager, select the StreamSets service, then click Configuration. Click Category > Logs, and then modify the value of the Data Collector Logging Threshold property.

When you’ve finished troubleshooting, set the log level back to INFO to avoid having verbose log files.

Shutting Down Data Collector

You can shut down and then manually launch Data Collector to apply changes to the Data Collector configuration file, environment configuration file, or user logins.

Use one of the following methods to shut down Data Collector:

User interface
To use the Data Collector UI for shutdown:
  1. Click Administration > Shut Down.
  2. When a confirmation dialog box appears, click Yes.
Command line when started as a service
To use the command line for shutdown when Data Collector is started as a service, use the required command for your operating system:
  • For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS, use: service sdc stop

  • For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS, use: systemctl stop sdc

Command line when started manually
To use the command line for shutdown when Data Collector is started manually, use the Data Collector process ID in the following command:
kill -15 <process ID>

Restarting Data Collector

You can restart Data Collector to apply changes to the Data Collector configuration file, environment configuration file, or user logins. During the restart process, Data Collector shuts down and then automatically restarts.
Choose the restart method based on how you initially started Data Collector:
  • Started manually

    If you changed or added an environment variable in the sdc-env.sh file, then you must restart Data Collector from the command prompt. Press Ctrl+C to shut down Data Collector and then enter bin/streamsets dc to restart Data Collector.

    If you did not change or add an environment variable, then you can restart Data Collector from the command prompt or from the user interface. To restart from the user interface, click Administration > Restart, expand StreamSets Data Collector was started manually, and then click Restart Data Collector.

  • Started as a service
    Run the appropriate command for your operating system:
    • For CentOS 6, Oracle Linux 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS, use:
      service sdc start
    • For CentOS 7, Oracle Linux 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS, use:
      systemctl start sdc
  • Started from Cloudera Manager

    Use Cloudera Manager to restart Data Collector. For information about how to restart a service through Cloudera Manager, see the Cloudera documentation.

  • Started from Docker

    Run the following Docker command:

    docker restart <containerID>

The restart process can take a few moments to complete. Refresh the browser to log in again.

Viewing Users and Groups

If you use file-based authentication, you can view all user accounts granted access to this Data Collector instance, including the roles and groups assigned to each user.

To view users and groups, click Administration > Users and Groups. Data Collector displays a read-only view of the users, groups, and roles.

You configure users, groups, and roles for file-based authentication in the associated realm.properties file located in the Data Collector configuration directory, $SDC_CONF. For more information, see Configuring File-Based Authentication.

Note: If the Data Collector is registered with StreamSets Control Hub and you click Administration > Users and Groups, the Data Collector logs you into Control Hub and displays the Users view within Control Hub. Registered Data Collectors use Control Hub user authorization. For more information, see Register Data Collector with Control Hub.

Managing Usage Statistics Collection

You can help to improve Data Collector by allowing StreamSets to collect usage statistics about Data Collector system performance and the features that you use. This telemetry data helps StreamSets to improve product performance and to make feature development decisions.

You can configure whether to allow usage statistics collection.

  1. Click Administration > Usage Statistics.
  2. Select the Share usage data with StreamSets checkbox to enable usage statistics collection.

    Clear the checkbox if you prefer not to share usage statistics.

  3. Click Save.

Support Bundles

You can use Data Collector to generate a support bundle. A support bundle is a ZIP file that includes Data Collector logs, environment and configuration information, pipeline JSON files, resource files, and other details to help troubleshoot issues. You upload the generated file to a StreamSets Support ticket, and the Support team can use the information to help resolve your tickets. Alternatively, you can send the file to another StreamSets community member.

Data Collector uses several generators to create a support bundle. Each generator bundles different types of information. You can choose to use all or some of the generators.

Each generator automatically redacts all passwords entered in pipelines, configuration files, or resource files. The generators replace all passwords with the text REDACTED in the generated files. You can customize the generators to redact other sensitive information, such as machine names or user names.

Before uploading a generated ZIP file to a support ticket, we recommend verifying that the file does not include any sensitive information that you do not want to share.

Generators

Data Collector can use the following generators to create a support bundle:

Generator Description
SDC Info Includes the following information:
  • Data Collector configuration files.
  • Permissions granted to users on Data Collector directories.
  • Data Collector environment configuration file.
  • Data Collector version and system properties for the machine where Data Collector is installed.
  • Data Collector runtime information including pipeline metrics and a thread dump.
Pipelines Includes the following JSON files for each pipeline:
  • history.json
  • info.json
  • offset.json
  • pipeline.json

By default, all Data Collector pipelines are included in the bundle.

Blob Store Internal blob store containing information provided by Control Hub.
Logs Includes the most recent content of the following log files:
  • Garbage collector log - gc.log
  • Data Collector log - sdc.log

In addition, Data Collector always generates the following files when you create a support bundle:

  • metadata.properties - ID and version of the Data Collector that generated the bundle.
  • generators.properties - List of generators used for the bundle.

Generating a Support Bundle

When you generate a support bundle, you choose the information to include in the bundle. Only users with the Admin role can generate support bundles.

You can download the bundle, and then verify its contents and upload it to a StreamSets Support ticket.

  1. Click the Help icon, and then click Support Bundle.
  2. Select the generators that you want to use.
  3. Click Download.

    Data Collector generates the support bundle and saves it to a ZIP file in your default downloads directory.

    You can manually upload the file to a StreamSets Support ticket.

    Before sharing the file, verify that the file does not include sensitive information that you do not want to share. For example, you might want to remove the pipelines not associated with your support ticket. By default, the bundle includes all Data Collector pipelines.

Customizing Generators

By default, the generators redact all passwords entered in pipelines, configuration files, or resource files. You can customize the generators to redact other sensitive information, such as machine names or user names.

To customize the generators, modify the support bundle redactor file, $SDC_CONF/support-bundle-redactor.json. The file contains rules that the generators use to redact sensitive information. Each rule contains the following information:

  • description - Description of the rule.
  • trigger - String constant that triggers a redaction. If a line contains this trigger string, then the redaction continues by applying the regular expression specified in the search property.
  • search - Regular expression that defines the sub-string to redact.
  • replace - String to replace the redacted information with.
You can add additional rules that the generators use to redact information. For example, to customize the generators to redact the names of all machines in the StreamSets domain, add the following rule to the file:
{
"description": "Custom domain names",
"trigger": ".streamsets.com",
"search": "[a-z_-]+.streamsets.com",
"replace": "REDACTED.streamsets.com"
}

Health Inspector

The Data Collector Health Inspector provides a snapshot of how Data Collector is functioning. When you run Health Inspector, it performs checks for common misconfigurations and errors. You can use the Health Inspector to quickly check the health of your Data Collector.

Health Inspector provides only Data Collector-level details. For pipeline-level details, monitor the pipeline or review the Data Collector log.

The Health Inspector provides the following categories of information:
  • Data Collector configuration - Displays the settings for certain Data Collector configuration properties, such as the maximum number of pipeline errors allowed in production.
  • Java Virtual Machine (JVM) process - Displays the settings for certain JVM configuration properties, such as the maximum amount of memory allotted to the JVM. Also generates related usage statistics, such as the percentage of the JVM memory currently used by Data Collector.
  • Machine - Displays important details about available resources on the Data Collector machine, such as the available space in the runtime directory.
  • Networking - Verifies that the internet is accessible by pinging the StreamSets website.

Viewing the Health Inspector

Data Collector generates Health Inspector details each time you open the Health Inspector page.

  1. To view the Data Collector Health Inspector, click the Help icon, and then click Health Inspector.
  2. To view all available information, click the Expand All link.

    Green indicates that values are within expected range. Red indicates that values fall beyond the expected range.

    Some details, such as JVM Child Processes, provide additional information. To view that information, click Show Output.
  3. To refresh a category of information, click the Rerun link for the category.
  4. To refresh all Health Inspector details, navigate away from the page, and then return.

REST Response

You can view REST response JSON data for different aspects of the Data Collector, such as pipeline configuration information or monitoring details.

You can use the REST response information to provide Data Collector details to a REST-based monitoring system. Or you might use the information in conjunction with the Data Collector REST API.

You can access the following REST response data:
  • Pipeline Configuration - Provides information about the pipeline and each stage in the pipeline.
  • Pipeline Rules - Provides information about metric and data rules and alerts.
  • Definitions - Provides information about all available Data Collector stages.
  • Preview Data- Provides information about the preview data moving through the pipeline. Also includes monitoring information that is not used in preview.
  • Pipeline Monitoring - Provides monitoring information for the pipeline.
  • Pipeline Status - Provides the current status of the pipeline.
  • Data Collector Metrics - Provides metrics about Data Collector.
  • Thread Dump - Lists all active Java threads used by Data Collector.

Viewing REST Response Data

You can view REST response data from the location where the relevant information displays. For example, you can view Data Collector Metrics REST response data from the Data Collector Metrics page.

You can view REST response data from the following locations:
Edit mode
From the Properties panel, you can use the More icon () to view the following REST response data:
  • Pipeline Configuration
  • Pipeline Rules
  • Pipeline Status
  • Definitions
Preview mode
From the Preview panel, you can use the More icon to view the Preview Data REST response data.
Monitor mode
From the Monitor panel, you can use the More icon to view the following REST response data:
  • Pipeline Monitoring
  • Pipeline Configuration
  • Pipeline Rules
  • Pipeline Status
  • Definitions
Data Collector Metrics page
From the Data Collector Metrics page, Administration > SDC Metrics , you can use the More icon to view the following REST response data:
  • Data Collector Metrics
  • Thread Dump

Disabling the REST Response Menu

You can configure the Data Collector to disable the display of REST responses.

  1. To disable the REST Response menus, click the Help icon, and then click Settings.
  2. In the Settings window, select Hide the REST Response Menu.

Command Line Interface

Data Collector provides a command line interface that includes a basic cli command. Use the command to perform some of the same actions that you can complete from the Data Collector UI. Data Collector must be running before you can use the cli command.

You can use the following commands with the basic cli command:
help
Provides information about each command or subcommand.
manager
Provides the following subcommands:
  • start - Starts a pipeline.
  • status - Returns the status of a pipeline.
  • stop - Stops a pipeline.
  • reset-origin - Resets the origin when possible.
  • get-committed-offsets - Returns the last-saved offset for pipeline failover.
  • update-committed-offsets - Updates the last-saved offset for pipeline failover.
store
Provides the following subcommands:
  • import - Imports a pipeline.
  • list - Lists information for all available pipelines.
system
Provides the following subcommands:
  • enableDPM - Register the Data Collector with StreamSets Control Hub.
  • disableDPM - Unregister the Data Collector from Control Hub.

Java Configuration Options for the Cli Command

Use the SDC_CLI_JAVA_OPTS environment variable to modify Java configuration options for the cli command.

For example, to set the -Djavax.net.ssl.trustStore option for the cli command when using Data Collector with HTTPS, run the following command:

export SDC_CLI_JAVA_OPTS="-Djavax.net.ssl.trustStore=<path to truststore file> ${SDC_CLI_JAVA_OPTS}"

Using the Cli Command

Call the cli command from the $SDC_DIST directory.

Use the following command as the base for all cli commands:
bin/streamsets cli \
(-U <sdcURL> | --url <sdcURL>) \
[(-a <sdcAuthType> | --auth-type <sdcAuthType>)] \
[(-u <sdcUser> | --user <sdcUser>)] \
[(-p <sdcPassword> | --password <sdcPassword>)] \
[(-D <dpmURL> | --dpmURL <dpmURL>)] \
<command> <subcommand> [<args>] 

The usage of the basic command options depends on whether or not the Data Collector is registered with Control Hub.

Not Registered with Control Hub

The following table describes the options for the basic command when the Data Collector is not registered with Control Hub:
Option Description
-U <sdcURL>

or

--url <sdcURL>
Required. URL of the Data Collector.

The default URL is http://localhost:18630/.

-a <sdcAuthType>

or

--auth-type <sdcAuthType>
Optional. HTTP authentication type used by the Data Collector.
-u <sdcUser>

or

--user <sdcUser>

Optional. User name to use to log in. The roles assigned to the user account determine the tasks that you can perform.

If you omit this option, the Data Collector allows admin access.

-p <sdcPassword>

or

--password <sdcPassword>

Optional. Required when you enter a user name. Password for the user account.
-D <dpmURL>

or

--dpmURL <dpmURL>
Not applicable. Do not use when the Data Collector is not registered with Control Hub.
<command> Required. Command to perform.
<subcommand> Required for all commands except help. Subcommand to perform.
<args> Optional. Include arguments and options as needed.

Registered with Control Hub

The following table describes the options for the basic command when the Data Collector is registered with Control Hub:
Option Description
-U <sdcURL>

or

--url <sdcURL>
Required. URL of the Data Collector.

The default URL is http://localhost:18630/.

-a <sdcAuthType>

or

--auth-type <sdcAuthType>
Required. Authentication type used by the Data Collector. Set to dpm.

If you omit this option, Data Collector uses the Form authentication type, which causes the command to fail.

-u <sdcUser>

or

--user <sdcUser>

Required. User account to log in. Enter your Control Hub user ID using the following format:
<ID>@<organization ID>

The roles assigned to the Control Hub user account determine the tasks that you can perform.

If you omit this option, Data Collector uses the admin user account, which causes the command to fail.

-p <sdcPassword>

or

--password <sdcPassword>

Required. Enter the password for your Control Hub user account.
-D <dpmURL>

or

--dpmURL <dpmURL>
Required. Set to: https://cloud.streamsets.com.
<command> Required. Command to perform.
<subcommand> Required for all commands except help. Subcommand to perform.
<args> Optional. Include arguments and options as needed.

Help Command

Use the help command to view additional information for the specified command.

For additional information for each command, including the available arguments, use the help command as follows:
bin/streamsets cli \
(-U <sdcURL> | --url <sdcURL>) \
[(-a <sdcAuthType> | --auth-type <sdcAuthType>)] \
[(-u <sdcUser> | --user <sdcUser>)] \
[(-p <sdcPassword> | --password <sdcPassword>)] \
[(-D <dpmURL> | --dpmURL <dpmURL>)] \
help <command> [<subcommand>]
For example, the following command displays the details for the manager command. Use the same command options when the Data Collector is registered or is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 help manager

Manager Command

The manager command provides subcommands to start and stop a pipeline, view the status of all pipelines, and reset the origin for a pipeline. It can also be used to get the last-saved offset and to update the last-saved offset for a pipeline.

The manager command returns the pipeline status object after it successfully completes the specified subcommand. The following is a sample of the pipeline status object:
{
  "user" : "admin",
  "name" : "MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db",
  "pipelineID" : "MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db",
  "rev" : "0",
  "status" : "STOPPING",
  "message" : null,
  "timeStamp" : 1447116703147,
  "attributes" : { },
  "executionMode" : "STANDALONE",
  "metrics" : null,
  "retryAttempt" : 0,
  "nextRetryTimeStamp" : 0
}

Note that the timestamp is in the Long data format.

You can use the following manager subcommands:

start
Starts a pipeline. Returns the pipeline status when successful.
The start subcommand uses the following syntax:
manager start \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] \
[(-R <runtimeParametersString> | --runtimeParameters <runtimeParametersString>)]
Start Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline to start.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to start an older version of the pipeline.

By default, the Data Collector starts the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot start the pipeline.

Use to debug the problem or pass to StreamSets for help.

-R <runtimeParametersString>

or

--runtimeParameters <runtimeParametersString>

Optional. Runtime parameter values to start the pipeline with. Overrides the parameter default values defined for the pipeline.
Enter the runtime parameters using the following format:
'
{"<runtime parameter1": "<value1>", "<runtime parameter2>": "<value2>"}
'
For example:
'
{"RootDir": "/error", "JDBCConnection": "jdbc:mysql://localhost:3306/customers"}
'
For example, the following command starts the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command starts the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager start -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command starts the first version of the same pipeline when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager start -n MyPipeilnejh65k1f8-dfc1-603h-8124-718nj6e561db -r 1
stop
Stops a pipeline. Returns the pipeline status when successful.
The stop subcommand uses the following syntax:
manager stop \
[--forceStop] \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Stop Option Description
--forceStop Optional. Forces the pipeline to stop immediately.

In some situations, a pipeline can remain in a Stopping state for up to five minutes. For example, if a scripting processor in the pipeline includes code with a timed wait or an infinite loop, Data Collector waits for five minutes before forcing the pipeline to stop.

-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline to stop.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to stop an older version of the pipeline.

By default, the Data Collector stops the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot stop the pipeline.

Use to debug the problem or pass to StreamSets for help.

For example, the following command stops the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager stop -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command stops the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager stop -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command forces the first version of the same pipeline to stop immediately when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager stop --forceStop -n MyPipelinejh65k1f8-dfc1-603h-8124-718nj6e561db -r 1
status
Returns the status of a pipeline. Returns the pipeline status when successful.
The status subcommand uses the following syntax:
manager status \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Status Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use for older versions of the pipeline.

By default, the Data Collector returns information for the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot return the status of the pipeline.

Use to debug the problem or pass to StreamSets for help.

For example, the following command returns the status of the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager status -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command returns the status of the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager status -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command returns the status of the first version of the same pipeline when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager status -n MyPipelinejh65k1f8-dfc1-603h-8124-718nj6e561db -r 1
reset-origin
Resets the origin of a pipeline. Use for pipeline origins that can be reset. Some pipeline origins cannot be reset. Returns the pipeline status when successful.
The reset-origin subcommand uses the following syntax:
manager reset-origin \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack]
Reset Origin Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline to reset the origin.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to reset the origin for an older version of the pipeline.

By default, the Data Collector resets the origin for the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot reset the origin.

Use to debug the problem or pass to StreamSets for help.

For example, the following command resets the origin of the pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  manager reset-origin -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
The following command resets the origin of the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630  -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager reset-origin -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db
get-committed-offsets
Returns the last-saved offset for a pipeline with an origin that saves offsets. Some origins, such as the HTTP Server, have no need to save offsets.
Pipeline offsets are managed by Data Collector. There's no need to get or replace the last-saved offset unless implementing pipeline failover using an external storage system.
When implementing pipeline failover, use this subcommand to store the last-saved offset to a file. When necessary, you can use the update-committed-offsets command to update the pipeline offset with the contents of the file.
The get-committed-offsets subcommand uses the following syntax:
manager get-committed-offsets \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Get Offset Option Description
-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to reset the origin for an older version of the pipeline.

By default, the Data Collector uses the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot retrieve the last-saved offset.

Use to debug the problem or pass to StreamSets for help.

For example, the following command returns the last-saved offset for a pipeline with an ID of MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager get-committed-offsets \
 -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc
The following command returns the last-saved offset of the same pipeline when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager get-committed-offsets -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc
update-committed-offsets
Updates the last-saved offset for a pipeline with an origin that saves offsets. Some origins, such as the HTTP Server, have no need to save offsets.
Pipeline offsets are managed by Data Collector. There's no need to update the last-saved offset unless performing pipeline failover from a file that contains the last-saved offset stored by using get-committed-offsets.
Change the last-saved offset with great caution and only when the pipeline is not running.
The updated-committed-offsets subcommand uses the following syntax:
manager update-committed-offsets \
(-f <fileName> | --file <fileName>) \
(-n <pipelineID> | --name <pipelineID>) \
[(-r <pipelineRev> | --revision <pipelineRev>)] \
[--stack] 
Update Offset Option Description
-f <fileName>

or

--file <fileName>

Required. Relative or absolute path to the file that contains the last-saved offset.

The file should contain only the last-saved offset retrieved by using the get-committed-offset subcommand.

-n <pipelineID>

or

--name <pipelineID>

Required. ID of the pipeline.

Data Collector generates the ID when the pipeline is created. Data Collector uses the alphanumeric characters entered for the pipeline title as a prefix for the generated pipeline ID.

-r <pipelineRev>

or

-- revision <pipelineRev>

Optional. The revision of the pipeline. Use to update the last-saved offset for an older version of the pipeline.

By default, the Data Collector uses the most recent version.

--stack Optional. Returns additional information when the Data Collector cannot update the last-saved offset.

Use to debug the problem or pass to StreamSets for help.

For example, the following command updates the last-saved offset for a pipeline using the offset in the specified file when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 manager get-committed-offsets \
-f /sdc/offsetfiles/mypipeline/offset.txt -n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc
The following command updates the last-saved offset for the same pipeline using the offset in the specified file when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com manager get-committed-offsets -f /sdc/offsetfiles/mypipeline/offset.txt \
-n MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e236lc

Store Command

The store command provides subcommands to view a list of all pipelines and to import a pipeline.

You can use the following subcommands with the store command:
list
Lists all available pipelines. The list subcommand uses the following syntax:
store list
Returns the following information for each pipeline:
 {
  "name" : "<pipeline ID>",
  "pipelineId" : "<pipeline ID>",
  "title" : "<pipeline title>",
  "description" : "< >",
  "created" : <created time>,
  "lastModified" : <last modified time>,
  "creator" : "admin",
  "lastModifier" : "admin",
  "lastRev" : "0",
  "uuid" : "<internal ID used for optimistic locking>",
  "valid" : true,
  "metadata" : {
    "labels" : [ ],
    "dpm.pipeline.id" : "<Control Hub pipeline ID>:<organization name>",
    "dpm.pipeline.version" : "<published pipeline version>"
  }
},
For example, the following command lists all pipelines associated with the Data Collector when it is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 store list
The following command lists all pipelines associated with the Data Collector when it is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com store list
import
Imports a pipeline. Use to import a pipeline JSON file, typically exported from a Data Collector. Returns a message when the import is successful.
The import subcommand uses the following syntax:
store import \
(-n <pipelineTitle> | --name <pipelineTitle>) \
[--stack] \
[(-f <fileName> | --file <fileName>)]
Import Option Description
-n <pipelineTitle>

or

--name <pipelineTitle>

Required. Title for the imported pipeline.

If the title includes spaces, surround the title in quotation marks.

--stack Optional. Returns additional information when the Data Collector cannot import the pipeline.

Use to debug the problem or pass to StreamSets for help.

-f <fileName>

or

--file <fileName>

Optional. The location and name of the file to import.

Enter a path relative to the Data Collector installation directory.

For example, the following command creates a pipeline with the title "Files to HDFS" based on the files2hdfs.json file when the Data Collector is not registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 store import -n "Files to HDFS" -f ../../exported_pipelines/files2hdfs.json
The following command creates a pipeline with the title "Files to HDFS" based on the files2hdfs.json file when the Data Collector is registered with Control Hub:
bin/streamsets cli -U http://localhost:18630 -a dpm -u user1@mycompany -p MyPassword \
--dpmURL https://cloud.streamsets.com store import -n "Files to HDFS" -f ../../exported_pipelines/files2hdfs.json

System Command

The system command provides subcommands to register and unregister the Data Collector with Control Hub.

You can use the following subcommands with the system command:

enableDPM
Registers the Data Collector with Control Hub. For a description of the syntax, see Registering from the Command Line Interface.
disableDPM
Unregisters the Data Collector with Control Hub. For a description of the syntax, see Unregistering from the Command Line Interface.