Administration

Providing an Activation Code

The activation code determines the maximum number of Spark executors allowed for each pipeline.

Users with an enterprise account need to provide an activation code only in the following cases:
  • When using a Docker Transformer image that is not registered with Control Hub.
  • When you want to use more executors than previously licensed to your account.

Users with an enterprise account can submit a request for an activation code through the StreamSets Support portal. Users without an enterprise account do not need to provide an activation code.

After you receive an email with the activation code, log in to Transformer. On the registration page shown below, click Enter a Code, then paste the code into the Activation window and click Activate.

Users with the Admin role can view activation details by clicking Administration > Activation.

Updating the Activation Code

Users with an enterprise account might need to update the activation code when the current code expires or when you request an updated code to increase the maximum number of Spark executors.

You can submit a request for an activation code through the StreamSets Support portal. You need the Admin role to update the activation code.

  1. After receiving the email with the activation code, copy the activation code from the email.
  2. Click Administration > Activation.

    The Activation window lists the Transformer product ID and user that the current activation code is licensed to. It also lists the expiration date for the current code and the maximum number of Spark executors that can be used to run each pipeline.

  3. Click Update Activation Code.
  4. Paste the activation code into the Activation Code text box and then click Activate.

Viewing Transformer Configuration Properties

To view Transformer configuration properties, click Administration > Configuration.

To edit the properties, edit the Transformer configuration file, $TRANSFORMER_CONF/transformer.properties.

Viewing Transformer Directories

You can view the directories that Transformer uses. You might check the directories being used to access a file in the directory or to increase the amount of available space for a directory.

Transformer directories are defined in environment variables. For more information, see Transformer Directories.

To view Transformer directories, click Administration > Transformer Directories.

The following table describes the Transformer directories that display:
Directory Includes Environment Variable
Runtime Base directory for Transformer executables and related files. TRANSFORMER_DIST
Configuration The Transformer configuration file, transformer.properties, and related realm properties files and keystore files.

Also includes the log4j properties file.

TRANSFORMER_CONF
Data Pipeline configuration and run details. TRANSFORMER_DATA
Log Transformer log file, transformer.log. TRANSFORMER_LOG
Resources Directory for runtime resource files. TRANSFORMER_RESOURCES
DT Libraries Extra Directory Directory to store external libraries. STREAMSETS_LIBRARIES_EXTRA_DIR

Viewing Transformer Metrics

You can view metrics about Transformer, such as CPU and heap memory usage.

Note: Transformer metrics do not include metrics about running pipelines because Spark handles all pipeline processing.
  1. To view Transformer metrics, click Administration > Transformer Metrics.
    The Transformer Metrics page displays all metrics by default.
  2. To modify the metrics that display on the page, click the More icon, and then click Settings.
  3. Remove any metric charts that you don't want to display, and then click Save.

Log Files

Transformer provides access to the following log files:
Transformer log
The Transformer log, $TRANSFORMER_LOG/transformer.log, provides information about the Transformer application, such as start-up messages, user logins, or pipeline display in the canvas. You can open the log file on the Transformer machine, or you can view the contents of the log file from the Transformer UI, as described in Viewing the Transformer Log.
The Transformer log can also include some information about local pipelines or cluster pipelines run on Hadoop YARN in client deployment mode. For these types of pipelines, the Spark driver program is launched on the local Transformer machine. As a result, some pipeline processing messages are included in the Transformer log.
Spark driver log
A Spark driver log provides information about how Spark runs, previews, and validates pipelines.
By default, messages in the Spark driver log are logged at the ERROR severity level. To modify the log level, change the Log Level property on the Cluster tab for the pipeline.
You can view and download the Spark driver log from the Transformer UI for the following types of pipelines:
  • Local pipelines
  • Cluster pipelines run in Spark standalone mode
  • Cluster pipelines run on Amazon EMR
  • Cluster pipelines run on Hadoop YARN in client deployment mode
For local pipelines or cluster pipelines run on Hadoop YARN in client deployment mode, you can also open the Spark driver log file written to the following location on the Transformer machine for each pipeline: $TRANSFORMER_DATA/runInfo/<pipelineID>/run<timestamp>/driver-all.log

For all other cluster pipelines, the Spark driver program is launched remotely on one of the worker nodes inside the cluster. To view the Spark driver logs for these pipelines, access the Spark web UI for the application launched for the pipeline. Transformer provides easy access to the Spark web UI for many cluster types.

Viewing the Transformer Log

You can view and download Transformer log data from the Transformer UI. When you download log data, you can select the file to download.

  1. To view log data for Transformer, click Administration > Logs.
    The Transformer UI displays roughly 50,000 characters of the most recent log information.
  2. To stop the automatic refresh of log data, click Stop Auto Refresh.
    Or, click Start Auto Refresh to view the latest data.
  3. To view earlier events, click Load Previous Logs.
  4. To download the latest log file, click Download. To download a specific log file, click Download > <file name>.
    The most recent information is in the file with the highest number.

Log Format

Transformer uses the Apache Log4j library to write log data. Each log entry includes a timestamp and message along with additional information relevant for the message.

In the Transformer UI, each log entry has the following information:
  • Timestamp
  • Pipeline
  • Severity
  • Message
  • Category
  • User
  • Runner
  • Thread
In the downloaded log file, the log entry has the same information, presented in a different order, as well as the stage that encountered the message for the following pipeline types:
  • Local pipelines
  • Cluster pipelines run on Hadoop YARN in client deployment mode
Note: The downloaded log file does not include stage information for other types of cluster pipelines.

The information included in the downloaded file is set by the appender.streamsets.layout.pattern in the log configuration file, $TRANSFORMER_CONF/transformer-log4j2.properties.

To customize the log format, see the Log4j documentation. Transformer provides the following custom objects:
  • %X{s-entity} - Local pipeline name and ID
  • %X{s-runner} - Runner ID
  • %X{s-stage} - Stage name
  • %X{s-user} - User who initiated the operation

Modifying the Log Level

If the Transformer log does not provide enough troubleshooting information, you can modify the log level to display messages at another severity level.

By default, Transformer logs messages at the INFO severity level. You can configure the following log levels:
  • TRACE
  • DEBUG
  • INFO (Default)
  • WARN
  • ERROR
  • FATAL
  1. Click Administration > Logs.
  2. Click Log Config.
    Transformer displays the contents of the log configuration file, $TRANSFORMER_CONF/transformer-log4j2.properties.
  3. Change the default value of INFO for the following line in the file:
    logger.l1.level=INFO

    For example, to set the log level to DEBUG, modify the line as follows:

    logger.l1.level=DEBUG
  4. Click Save.
    The changes that you make to the log level take effect immediately - you do not need to restart Transformer. You can also change the log file by directly editing the log configuration file, $TRANSFORMER_CONF/transformer-log4j2.properties.

When you’ve finished troubleshooting, set the log level back to INFO to avoid having verbose log files.

Viewing the Spark Driver Log

Certain pipeline types provide access to the Spark driver log. For a list, see Spark driver log.

  1. To view the Spark driver log for the current pipeline run, click the Summary tab in the monitoring panel, and then click Driver Logs in the Runtime Statistics section:

    Or to view the Spark driver log for a previous pipeline run, click the History tab in the monitoring panel, and then click Driver Logs in the Summary column.

    The Transformer UI displays the most recent driver log information.

  2. Click Refresh to view the latest data.
  3. To view earlier data, click Load Previous Logs.
  4. To download the latest log data, click Download.

Shutting Down Transformer

You can shut down and then manually launch Transformer to apply changes to the Transformer configuration file, environment configuration file, or user logins.

Use one of the following methods to shut down Transformer:

User interface
To use the Transformer user interface (UI) for shutdown:
  1. Click Administration > Shut Down.
  2. When a confirmation dialog box appears, click Yes.
Command line when started as a service
To use the command line for shutdown when Transformer is started as a service, use the required command for your operating system:
  • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use: service transformer stop

  • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use: systemctl stop transformer

Command line when started manually
To use the command line for shutdown when Transformer is started manually, run the following command using the process ID displayed in the command prompt when you started Transformer:
kill -15 <process ID>

Restarting Transformer

You can restart Transformer to apply changes to the Transformer configuration file, environment configuration file, or user logins. During the restart process, Transformer shuts down and then automatically restarts.

Choose the restart method based on how you initially started Transformer:
Started manually
If you changed or added an environment variable in the transformer-env.sh file, then you must restart Transformer from the command prompt. Press Ctrl+C to shut down Transformer and then enter bin/streamsets transformer to restart Transformer.

If you did not change or add an environment variable, then you can restart Transformer from the command prompt or from the user interface. To restart from the user interface, click Administration > Restart, expand StreamSets Transformer was started manually, and then click Restart Transformer.

Started as a service
Run the appropriate command for your operating system:
  • For CentOS 6, Oracle Linux 6, or Red Hat Enterprise Linux 6, use:
    service transformer start
  • For CentOS 7, Oracle Linux 7, or Red Hat Enterprise Linux 7, use:
    systemctl start transformer
Started from Docker

Run the following Docker command:

docker restart <containerID>

The restart process can take a few moments to complete. Refresh the browser to log in again.

Opting Out of Usage Statistics Collection

You can help to improve Transformer by allowing StreamSets to collect usage statistics about Transformer system performance and features that you use. This information helps StreamSets to improve product performance and to make product development decisions.

If desired, you can opt out of usage statistics collection.

  1. Click Administration > Usage Statistics.
  2. Clear the Share usage data with StreamSets checkbox.
  3. Click Save.