Release Notes

October 2024

The following Control Hub releases occurred in October 2024.

October 16

This release includes an enhancement and several fixed issues.

Enhancement

Sequences
You can add a finish condition to a sequence step to define when the job stops running.

Fixed Issues

  • Some Control Hub REST APIs can cause unexpected behavior because the APIs no longer return the unused parentJobId and systemJobId fields. For backwards compatibility, the APIs now continue to return these fields.
  • When network connection issues occur while a deployment is installing stage libraries on associated engines, the engines can fail to restart.
  • If you select a later authoring engine version for a draft pipeline that includes properties with empty values, Control Hub might generate the following error during pipeline preview: Cannot read properties of null

  • When you select a later authoring engine version for a draft pipeline and you choose to publish the draft pipeline when the engine version used to edit the pipeline is no longer available, Control Hub generates the following error: HTTP Status: 404 (Not Found)

September 2024

The following Control Hub releases occurred in September 2024.

September 25

This release includes an enhancement.

Enhancement

IBM ID
Users can log in with an IBM ID, in addition to the existing options.

September 18 and 20

This release occurred on September 18, 2024 for the eu01.hub.streamsets.com instance. The release occurred on September 20, 2024 for all other instances.

Enhancements

Sequences
You can use the following new properties to search for sequences:
  • Job Name - Search for sequences that include the specified job name.
  • Created By - Search for sequences created by the specified user email address.
  • Last Modified By - Search for sequences last modified by the specified user email address.
Data Collector pipeline failover
When a Data Collector engine gracefully shuts down, Control Hub typically considers the engine unresponsive within one minute. As a result, if a currently active job is enabled for pipeline failover, Control Hub usually restarts the pipeline on another available engine within one minute.
Previously, Control Hub always waited for the maximum engine heartbeat interval to expire, 5 minutes by default, before considering the engine unresponsive.

Behavior Change

Deprecated parameters in Control Hub REST API
The withWrapper parameter used in the /jobrunner/rest/v1/saql/jobs/search/{jobId}/runs REST API has been deprecated and will be removed in a future release.
With this release, Control Hub ignores this parameter and always returns a paginated wrapper.

Fixed Issues

  • When you delete a pipeline, Control Hub does not automatically delete the draft run created for that pipeline.

    To manually delete a draft run for a previously deleted pipeline, click Run > Draft Runs in the Navigation panel, select the draft run and then click the Delete icon.

  • When you import a pipeline with a commit message larger than 255 characters, Control Hub displays an incorrect error message.
  • The generated installation script for a Transformer Docker image for a self-managed deployment always uses the default port 19630, even when you configure a different port number in the Transformer configuration properties.

August 2024

The following Control Hub releases occurred in August 2024.

August 16, 2024

This release includes several enhancements, behavior changes, and fixed issues.

Enhancements

Subscription parameters
You can use the JOB_TAG parameter for a subscription triggered by a job status change.
For example, you can configure a subscription to send an email only when jobs that have a Production job tag change status, instead of when every job changes status.
Connections
In the table of connections in the Connections view, the Owner column has been renamed to Creator to clarify that the column displays the user that created the connection.

Behavior Changes

Engine credential store properties for existing deployments
Existing deployments that define sensitive data such as a password for credential store properties in the engine advanced configuration now display the sensitive values as REDACTED.
Previously, existing deployments that defined credential store properties displayed sensitive values in plain text until you edited and saved the deployment.
Deprecated parameters in Control Hub REST API
The following parameters used in several jobrunner and pipelinestore REST APIs have been deprecated and will be removed in a future release:
  • system
  • systemjob
  • includeSystemJobStatus

Previously, setting these parameters to true returned no pipelines or jobs because system pipelines and system jobs do not exist in platform Control Hub. With this release, Control Hub ignores these parameters and returns all pipelines or jobs meeting the remainder of the API call parameters.

Fixed Issues

  • A subscription triggered by a job status change event that does not use the JOB_OWNER parameter in the email action or event condition fails to send an email when the job owner no longer exists.
  • Runtime parameters do not display in the job instance details.
  • The advanced organization properties might display uneditable IP Auth Rules properties configured with test values.
  • When configuring an engine to use a proxy server, the http.nonProxyHosts property defined on the Proxy tab does not always take effect.

July 2024

The following StreamSets platform releases occurred in July 2024.

July 24, 2024

This release includes an enhancement and several fixed issues.

Enhancements

Sequences
You can view the history and errors for all runs of a sequence. When viewing all history or errors, you can search for historical log messages or errors by date. You can also delete log messages and errors for a sequence.
Previously, you could view the history and errors only for the last run of the sequence.
Product rename
Following the IBM acquisition of StreamSets, platform Control Hub is part of what is now known as IBM StreamSets.
Control Hub remains the central point of control for all of your data pipelines. As such, it is a critical component of the IBM StreamSets products. For more information, see What is Control Hub?

Fixed Issues

  • Users with the Job Operator and Pipeline User roles receive an HTTP 403 forbidden error when accessing the Job Instances and Job Templates views.
  • After installing a stage library from the pipeline canvas and then restarting the engine, the stage library panel still displays the newly installed stage as disabled.
  • Importing pipelines using the Control Hub REST API /v3/importer/{schVersion}/pipeline does not set the pipelineCommitID in the rules.

July 11, 2024

This release includes a fixed issue.

Fixed Issue

  • If you are required to verify your email when you accept an email invitation to join an existing organization, you are prompted to create a new organization rather than join the existing organization.

June 2024

The following StreamSets platform releases occurred in June 2024.

June 26, 2024

This release includes several enhancements, behavior changes, and fixed issues.

Enhancements

Pipeline and fragment import
When you import a pipeline or fragment, you select the file to import and then the import dialog box automatically detects the file type based on your selection, either an archive ZIP file or a JSON file.
Previously, you had to manually select the appropriate file type.
Kubernetes environments
Kubernetes environments support a new KUBERNETES_2024_06_14 feature version which adds startup and liveness probes to all deployments belonging to the environment.
The probes ensure that Control Hub correctly displays the deployment state as a deployment starts and ensure that the Kubernetes pod is restarted if a StreamSets engine freezes.
Self-managed deployments
Self-managed deployments include the following enhancements:
  • Engine installation type - The deployment wizard for self-managed deployments has been simplified. Instead of displaying a separate Install Type step, the wizard now has you choose the engine installation type, either a Docker image or a tarball file, in the Review & Launch step.
  • Proxy server configuration - For self-managed deployments using a Docker image installation of Data Collector 5.11.0 or later, Transformer 5.8.0 or later, or Transformer for Snowflake 5.2.0 or later, you can more easily configure the engine to use a man-in-the-middle proxy server. When configuring the deployment, you can include the custom certificate required by the server in the engine advanced configuration properties.
Engine Java version defined for deployments
The engine Java version defined for deployments includes the following enhancements:
  • Java version for self-managed deployments using a Docker image - When you configure a self-managed deployment using a Docker image for Data Collector 5.10.0 or later, you now define the Java version in the engine advanced configuration properties.

    Previously, you defined the Java version in the Review & Launch step of the deployment wizard when creating the deployment, or in the Install Engine Script dialog box when retrieving the installation script for an existing deployment.

  • Amazon EC2 and GCE deployments - When you configure an Amazon EC2 or GCE deployment for Data Collector 5.11.0 or later, you can define the Java version for the engine to run on.

    In most cases, you can use the default Java version. Some Data Collector stage libraries and use cases require specific Java versions.

  • Default Data Collector Java version - When you configure a Control Hub-managed deployment or a self-managed deployment using a Docker image for Data Collector 5.11.0 or later, Control Hub now deploys and installs Java 17 by default.

    For earlier Data Collector versions, Control Hub deployed and installed Java 8 by default.

Behavior Changes

Table columns reset to defaults
With this release, all resized, reordered, and hidden table columns are reset to the defaults.
You must customize the table columns after the release to match your previous customizations.
Engine log file location for Amazon EC2 or GCE deployments of Data Collector 5.11.0 or later
With this release, the engine log file for Amazon EC2 or GCE deployments of Data Collector 5.11.0 or later is located in the /logs directory.
For earlier Data Collector versions, the engine log file for these deployment types is located in the /var/log/sdc directory.

Fixed Issues

  • The pagination buttons in the Scheduled Tasks view are greyed out even when there are more scheduled tasks to display.
  • The pipeline canvas fails to check in a pipeline when an input field is focused before you click the Check In icon.

May 2024

The following StreamSets platform releases occurred in May 2024.

May 22, 2024

This release includes several enhancements and fixed issues.

Enhancements

Connection upgrades
When you edit a connection and choose an authoring Data Collector of a later version that includes changes to the connection properties, you can now choose to upgrade the connection to use the changed properties. Alternatively, you can choose to create a copy of the connection and then upgrade the copy to use the changed properties.
Previously, to see changes to a connection, you had to create a new connection using the later authoring Data Collector version.
Sequences
Sequences include the following enhancements:
  • When you view the details of a sequence, you can click the name of a job to navigate to the job details.
  • The Enable Sequence and Disable Sequence icons now include a label to clearly indicate the purpose of the icon.
Banner notifications
You might now see notification messages in the top banner letting you know about important information about the StreamSets platform. After reading the messages, and taking any appropriate action, you can dismiss the messages.

Fixed Issues

  • Existing job tags do not display when you edit a job from the Job Instances list.

April 2024

The following StreamSets platform releases occurred in April 2024.

April 26, 2024

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements

Sequences
You can create a sequence to run a collection of jobs in specified order based on conditions.
A sequence can include jobs that run on different types of StreamSets engines. For example, you can create a sequence that first runs a Data Collector job to load data to a data warehouse, and then runs a Transformer job to transform that data.
You can add a start condition to a sequence to schedule a time when the sequence starts. You can also configure a condition between steps that determines if the next step automatically starts when a job in the previous step encounters an error.
Note: Sequences are designated a Technology Preview feature. They are not meant for use in production.
Search
You can use the following new properties to search for pipelines or fragments:
  • Stages - Search for pipelines or fragments that include the selected stage. For example, you might search for all pipelines that include an Amazon S3 stage.
  • Stage Libraries - Search for pipelines or fragments that use stages included in the selected stage library. For example, you might search for all fragments that use stages included in the Apache Kafka 3.1.0 stage library.
Kubernetes deployments
When you configure a Kubernetes deployment for Data Collector 5.10 or later, you can define the Java version for the engine to run on.
In most cases, you can use the default Java version. Some Data Collector stage libraries and use cases require specific Java versions.
Engine credential store properties defined for deployments
When you enter sensitive data such as a password for credential store properties in the engine advanced configuration, Control Hub displays the sensitive values as REDACTED after you save the deployment.
Previously, Control Hub displayed sensitive values in plain text after you saved the deployment.
At this time, existing deployments that define credential store properties continue to display sensitive values in plain text until you edit and save the deployment. A future release will automatically update existing deployments to redact sensitive values.
Stage library mode defined for deployments
For advanced use cases, you can now configure a deployment to use the user-provided stage library mode where you provide the stage library files during the engine installation. For example, you might use the user-provided stage library mode if your organization requires that the stage library files be scanned for security purposes before they are installed on the engine machines.
Before you can use the user-provided stage library mode, your organization administrator must modify the organization configuration properties to display the Stage Library Mode property for deployments.
StreamSets strongly recommends using the default managed stage library mode. The managed stage library mode allows you to select stage libraries as part of the deployment, and then StreamSets automatically synchronizes the files to your engines.

Fixed Issues

  • A deployment configured to use a proxy server ignores the http.nonProxyHosts property when it includes a CIDR block.
  • You cannot export a pipeline that includes a connection specified as a parameter.

March 2024

The following StreamSets platform releases occurred in March 2024.

March 20, 2024

This release includes an enhancement and several fixed issues.

Enhancement

Deployment Java version
When you configure an Azure VM deployment or a self-managed deployment using a Docker image for Data Collector 5.10 or later, you can define the Java version for the engine to run on. In most cases, you can use the default Java version. Some Data Collector stage libraries and use cases require specific Java versions.
Additional deployment types will support selecting a Java version in a future release.

Fixed Issues

  • Clicking the Last page icon in the Scheduled Tasks view displays the next page of results instead of the last page.
  • Clicking Show Diff for a cloned deployment of a Transformer engine does not display the differences between the original and the cloned deployment.
  • Scheduled tasks sporadically fail to start, sometimes displaying a Connection is closed error in the run history.
  • A subscription that sends notifications when the job status changes to red incorrectly triggers for Transformer for Snowflake jobs.

February 2024

The following StreamSets platform releases occurred in February 2024.

February 21, 2024

This release includes an enhancement and a fixed issue.

Enhancement

Job run history
When you view the run history of a job and expand the details of a run, Control Hub displays a last-saved offset only when the offset contains information. When the offset is empty, no offset is shown.
Previously, Control Hub displayed empty offsets as follows: Offset:{ "version" : 2, "offsets" : { } }
Note: This enhancement has been disabled due to unexpected consequences.

Fixed Issue

  • When you search for jobs by Engine Label, auto-completion loads indefinitely.

February 7, 2024

This release includes an enhancement and a fixed issue.

Enhancement

SAML identity providers
When configuring SAML authentication, the Microsoft Azure Active Directory (Azure AD) identity provider has been renamed the Microsoft Entra ID identity provider to match the Microsoft rebranding.

Fixed Issue

  • When you sign in to the StreamSets platform with email, Google, or Microsoft after having previously signed in with a different option, you might encounter the following error: The requested action is invalid.

January 2024

The following StreamSets platform releases occurred in January 2024.

January 24, 2024

This release includes an enhancement and several fixed issues.

Enhancement

AWS environments
In addition to manually configuring the AWS credentials that Control Hub uses to access and provision resources in your AWS account, you can have StreamSets automatically configure a cross-account role for you.
When you use an automatic cross-account role, Control Hub prompts you to log into your AWS account and create a CloudFormation stack. The CloudFormation stack then automatically generates the required credential resources, including IAM policies, a cross-account role, and a default instance profile.
Previously, you could only manually configure the AWS credentials.

Fixed Issues

  • External resources do not work when stored in a subfolder in a private Google Cloud Storage bucket.
  • Sorting scheduled tasks by the next execution time sorts the tasks in the current page only rather than all of the tasks.
  • When Data Collector 5.7.0 and later or Transformer 5.6.0 and later are configured to use a proxy server, wildcard domains specified in the http.nonProxyHosts property are not correctly evaluated.

January 4, 2024

This release includes the following fixes:
  • Pipelines intermittently do not display in the pipeline canvas.
  • Jobs intermittently generate an error as you attempt to save the job, and then do not save.

2023

December 2023

The following StreamSets platform releases occurred in December 2023.

December 20, 2023

This release includes several enhancements and fixed issues.

Enhancements
Search
Search includes the following enhancements:
  • You can use the new Pipeline ID property to search for pipelines or fragments by the pipeline ID.
  • The ID property available for pipelines and fragments has been renamed to Commit ID to clarify that you use the property to search for pipelines or fragments by the commit ID.

    Existing saved searches for pipelines or fragments that use the previous ID property name continue to work and find pipelines or fragments by the commit ID.

Environments
When you create an environment, you select the feature version to use for that environment and for all deployments created for the environment.
A feature version allows you to decide when an existing environment and all of its deployments gain access to newer features that might require additional permissions in your cloud service provider account or that might require a restart of engines belonging to the deployments.
AWS environments support a new AWS_2023_12_15 feature version which includes support for Amazon EC2 spot instances and AWS launch templates.
At this time, all other environment types support an initial feature version.
Deployments
Deployments include the following enhancements:
  • Cloning deployments - Clone a deployment to quickly create another deployment with similar configurations. You can select the same engine version or a later engine version than used in the original deployment.

    You might clone a deployment as the first step in the engine upgrade process. When you edit a cloned deployment that uses a later engine version than the original deployment, you can view and compare the engine configuration differences between the original and cloned deployments.

  • Amazon EC2 deployments - You can configure an Amazon EC2 deployment to provision EC2 spot instances, in addition to on-demand instances. Requires the AWS_2023_12_15 environment feature version.

    Amazon EC2 deployments that use the initial feature version provision EC2 on-demand instances only.

  • Azure VM deployments - An Azure VM deployment for Data Collector 5.8.0 and later deploys the Data Collector engine as a tarball rather than a Docker image.

    Azure VM deployments for earlier Data Collector versions and for Transformer continue to deploy those engines as a Docker image.

Organization security
The primary organization administrator can designate another user assigned the Organization Administrator role to be the primary organization administrator.
Fixed Issues
  • You cannot export a draft pipeline or fragment using the Export option that removes plain text credentials.

  • You cannot import a draft pipeline that includes a fragment.

November 2023

The following StreamSets platform releases occurred in November 2023.

November 15, 2023

This release includes several enhancements, a behavior change, and fixed issues.

Enhancements
Preview
Preview includes the following enhancements:
  • Keyboard shortcut - You can use Command+Return on Mac and Ctrl+Enter on Windows machines to start previewing pipelines and fragments. Control Hub uses the existing preview configuration for the preview.
  • Table view - When you configure preview to display field types, the field type now shows in Table view next to the field name, as it does in list view.
Environments
When you view environment details in the Environments view, you can view the list of deployments that belong to the environment. To view details about a specific deployment, click the deployment name.
Deployments
Deployments include the following enhancements:
  • All deployment types - When you view deployment details in the Deployments view, you can click the name of the parent environment to navigate to the environment details.
  • Kubernetes deployments - You now configure advanced mode for a Kubernetes deployment in the main deployment wizard rather than in a separate dialog box.
Scheduled tasks
When creating a scheduled task that runs on a minute basis, you can specify which minute to start the job at. For example, if you create a scheduled task that runs every 15 minutes, you can schedule the task to start at 3 minutes after every hour, such that the task starts at 2:03, 2:18, 2:33, and 2:48.
Previously, all scheduled tasks that ran on a minute basis always started at the top of the hour. For example, all scheduled tasks that ran every 15 minutes ran at 2:00, 2:15, 2:30, and 2:45.
Organization default system limits
StreamSets has added a default system limit for the number groups that can exist in an organization. The limit is 1,000 groups.
Behavior Change
Pipeline and fragment version history

With this release, StreamSets has simplified pipeline and fragment version history. You can now access the version history of a pipeline or fragment from the pipeline canvas only.

Previously, you could access different views of the version history - one from the pipeline canvas and another from the Pipelines or Fragments view.

Fixed Issues
  • When users without the Organization Administrator role try to share an object that they have read access to, the Sharing Settings dialog box displays empty user and group names.
  • While editing a connection with an engine of a more recent version than the one the connection was created with, the wizard never moves to the second step for some connection types.
  • When a self-managed deployment defines a percentage JVM memory strategy for engines, the following Docker engine installation types on Linux with cgroup v2 enabled incorrectly allocate a percentage of the available memory on the host machine as the Java heap size, instead of the available memory in the Docker container:
    • Data Collector
    • Transformer prebuilt with Scala 2.11

November 1, 2023

This release includes the following fix:
  • When trying to join an existing StreamSets organization using a Microsoft corporate email account, users are guided to create a new organization, instead.

October 2023

The following StreamSets platform releases occurred in October 2023.

October 27, 2023

This release includes several enhancements and fixed issues.

Enhancements
Search
When performing a basic string search, you can select the operator does not contain to find objects that do not contain a specified string. For example, the following basic search condition finds all pipelines that do not contain the string ADLS2 in the pipeline name:
Name does not contain ADLS2
Previously, you had to use advanced mode to define a condition using the not equal to operator (!=) to search for objects that did not contain a specified string.
Monitoring jobs
When viewing the engine log for an active job, you can select another engine that you want to view the log for.
Editing environments and deployments
When you edit an existing environment or deployment, you can easily scroll to the configuration properties that you want to change.
Previously, you had to click through each step in the environment or deployment wizard to find the properties that you wanted to change.
Kubernetes deployments
When using advanced mode to configure a Kubernetes deployment, you can download the YAML file, edit the file in a text editor, and then upload the edited file.
Fixed Issues
  • If you use bulk edit mode to define runtime parameters for a job instance or job template and change the value of a runtime parameter to a number, you cannot edit the job instance or job template.
  • Control Hub incorrectly retrieves the latest offset for a single instance job, causing an outdated or incorrect offset to display.
  • Sorting scheduled tasks by the Next Execution Time property does not change the order in which the tasks are displayed.
  • When you change the owner of a pipeline, the new owner is not granted permission on the pipeline.

October 6, 2023

This release includes the following fixes:

  • When editing a scheduled task, the task incorrectly changes from daily to hourly and vice versa.
  • A warning message displays when you change the default value of the Column Separator, Quote Character, or Escape Character property for a pipeline stage and new values are not set.

September 2023

The following StreamSets platform releases occurred in September 2023.

September 29, 2023

This release includes the following fix:
  • Fields that previously held multiple lines of code, but were intended to have a single line, do not display the code.

September 27, 2023

This release includes several enhancements, fixed issues, and a behavior change.

Enhancements
Expression language display
StreamSets expressions display strings and the dollar sign and curly brackets that surround the expression in red and purple, respectively, as follows:

Previously, expressions displayed in all black text.

Code editors
Code editors allow entering a block of code, as for a JavaScript Evaluator processor. Code editors include the following enhancements:
  • When available, you can use a right arrow icon to collapse a section, and the down arrow icon to expand the section again.
  • To toggle a code editor to full screen and back, use F11 or Ctrl+B on Windows or Command+B on Mac. You can no longer use the Esc key to toggle the display.
  • On a properties page that includes a code editor, the Tab key moves the focus to the next property.

    As a result, you can no longer use the Tab key to add indents within the code editor.

    To add or reduce indents in the code editor, use the Ctrl key with square brackets ( ] [ ) on Windows or the Command key with square brackets ( ] [ ) on Mac.

Expression completion
Usability enhancements to expression completion include the following notable updates:
  • Expression completion displays new icons for element types, as follows:
    • Square icon ( □ ) for fields
    • c icon for constants
    • f icon for functions
    • x icon for runtime parameters
  • Expression completion displays the results of all element names that contain the specified characters. Previously, it only displayed element names that started with the characters.
    Note: With this change, you can no longer use the Tab key to accept expression completion suggestions. Instead, use the Return/Enter key, or select the suggestion.
Connection Details page

The Connection Details page has been enhanced to display all non-sensitive connection details, such as the authentication type or resource URL.

Default initial offset
You can upload a default initial offset for a job where the pipeline has previously run and pipeline failover is disabled for the job.
Job template details

When you select a job template to use when creating a job, the list of job templates displays with additional details and filters to enable choosing the correct template more easily.

Quick Start pipelines
Pipelines that you create using the Quick Start button are named Quick Start Pipeline <timestamp> by default. You can update pipeline names as needed.
Webhook subscription content type
When configuring a webhook subscription, if you do not specify a content type for the webhook, Control Hub uses text/plain as the content type.
Behavior Change
Balancing and synchronizing jobs
When you balance or synchronize a Data Collector job, Control Hub now temporarily stops the job to ensure that the last-saved offset from each pipeline instance is maintained. Control Hub then reassigns the pipeline instances to Data Collectors and restarts the job.
Previously when you balanced or synchronized a Data Collector job, the job remained running and Control Hub stopped and restarted individual pipeline instances.
Fixed Issues
  • Control Hub allows deleting a shared job associated with a scheduled task, resulting in a scheduled task that attempts to trigger actions on a job that no longer exists.
  • If a StreamSets Kubernetes agent loses network connectivity for an extended time, then recovers, Control Hub has the agent shut down instead of reconnecting to it.

    With this fix, Control Hub reconnects to the agent as long as a different StreamSets Kubernetes agent has not replaced it.

  • Metric rules and alerts based on the pipeline or stage memory consumption do not behave as expected. These rules have been removed.
  • Write to Another Pipeline was not a valid pipeline error handling option. This option has been removed.
  • External libraries, such as JDBC drivers, stored outside of the default location fail to load when Data Collector uses Java 8 with the Java Security Manager enabled.
  • When configuring a GCE deployment, the Instance Service Account property displays a maximum of 20 service accounts. If your GCP project includes more than 20 service accounts, the service account created as a Control Hub environment prerequisite might not display in the list.

August 2023

The following StreamSets platform releases occurred in August 2023.

August 23, 2023

This release includes several enhancements and fixed issues.

Enhancements
Using runtime parameters in pipelines
To use runtime parameters to define values for stage and pipeline properties, you can now select from a list of existing parameters or you can create and then use a new parameter.
Previously, you first had to define parameters on the pipeline Parameters tab. Then from the stage or pipeline property, you had to type the parameter name using the correct expression language syntax.
For details about using parameters, see the appropriate engine documentation:
Restart engines for an active deployment
You can restart all engine instances belonging to an active deployment from the Deployments view.
Previously, you had to navigate to the Engines view to restart engine instances.
Mapping jobs in a topology
When editing a topology and you click Add Job > Show All Jobs, you can search for jobs that you want to map to the topology. You can perform both basic and advanced searches.
Previously, you had to scroll through all jobs to select jobs to map to the topology.
Authoring engine timeout
An organization administrator can modify the default authoring engine timeout in the organization properties. Individual users can then override the default organization value in their browser settings within the My Account window.
Previously, an organization administrator could not modify the default organization value. The organization default was always five seconds, and individual users could modify that default value in their browser settings within the My Account window.
If you previously modified the authoring engine timeout in your browser settings, the modified value overrides any changes that an organization administrator makes to the default value.
Organization default system limits
StreamSets has increased the default system limits for the following types of objects:
  • Jobs - Increased from 500 to 10,000. The system limit for active jobs that can run concurrently is 1,000.
  • Scheduled tasks - Increased from 100 to 5,000.
Fixed Issues
  • Pipeline preview incorrectly showed multiple spaces in record values as a single space.
  • When using the Stream Selector processor, connecting the last stream when there were more than 4 streams required clicking on the very edge of the icon.
  • The Job Templates, Job Instances, and Draft Runs views display duplicate preset searches.
  • When a pipeline is using an inaccessible engine and then you switch to an accessible engine, Control Hub incorrectly indicates that the newly selected engine is not accessible until you refresh the browser.
  • When you edit a job using the context menu on the right side of the list of jobs, the Number of Instances property is always set to 0.
  • When you update an external resource archive to remove a file and then restart all engine instances in the deployment, Control Hub copies the updated archive file contents to the engine instances without first removing the deleted file. As a result, the file deleted from the archive still exists in the engine instances.

August 18, 2023

This release includes a behavior change.

Behavior Change
Update firewall outbound allowlists
The StreamSets authentication service, https://cloud.login.streamsets.com, is now hosted in another cloud service provider.
If you access the StreamSets platform from a machine that resides behind a firewall or in a system that limits access to specific DNS names and IP addresses, ensure that your firewall allows outbound connectivity to all of the following DNS names and IP addresses to ensure that you can continue to log into StreamSets:
  • 3.133.68.10
  • 3.134.148.19
  • 3.137.15.194

July 2023

The following StreamSets platform releases occurred in July 2023.

July 28, 2023

This release includes a behavior change.

Behavior Change
Update firewall outbound allowlists
In an upcoming release, the StreamSets authentication service, https://cloud.login.streamsets.com, will be hosted in another cloud service provider.
If you access the StreamSets platform from a machine that resides behind a firewall or in a system that limits access to specific DNS names and IP addresses, ensure that your firewall allows outbound connectivity to all of the following DNS names and IP addresses to ensure that you can continue to log into the StreamSets authentication service, cloud.login.streamsets.com:
  • Current IP address - 34.122.108.130
  • Future IP addresses:
    • 3.133.68.10
    • 3.134.148.19
    • 3.137.15.194

July 21, 2023

This release includes several enhancements and fixed issues.

Enhancements
Scheduled tasks
When you view the list of scheduled tasks, the next scheduled execution time displays for each task.
External resources
You can store an external resource archive file in a private Amazon S3 or Google Cloud Storage (GCS) bucket or in a private Azure Blob Storage or Azure Data Lake Storage Gen2 container, as long as the deployment using the archive file is of the same cloud service provider type. For example, if using an Amazon EC2 deployment, you can store the archive file in a private Amazon S3 bucket. You cannot store the archive file in a private GCS bucket.
To ensure that all engine instances managed by the deployment can access the private file, grant read access to the file to the AWS instance profile, the GCE instance service account, or the Azure managed identity associated with the provisioned instances.
Previously when you stored an external resource archive file in an Amazon S3 or GCS bucket or in an Azure container, you had to share the file publicly.
Engine resource thresholds
A deployment defines the maximum thresholds for the following engine resources:
  • CPU load
  • Memory used
  • Number of running pipelines
All engine instances belonging to the deployment inherit the same resource threshold values.
For advanced use cases, some organizations can override the resource thresholds for individual engine instances. However, when an engine restarts, overridden values are lost and the engine inherits the resource thresholds set for the deployment.
Previously, you defined resource thresholds for each engine instance. If you previously modified the default values for resource thresholds, those modified values are retained until the engine restarts.
This enhancement can result in a behavior change for existing automated testing.
Self-managed deployments
When you stop a self-managed deployment, running engine instances belonging to that deployment are also shut down. After you restart the self-managed deployment, you manually start each engine instance from the command line.
Previously when you stopped a self-managed deployment, running engine instances belonging to that deployment were deactivated and could no longer communicate with Control Hub, but continued to run. As a result, engine instances continued to use computing resources although they could no longer process data.
Google Compute Engine (GCE) deployments
When you create a GCE deployment, you specify whether to use an automatic or user managed replication policy to store GCP Secret Manager secrets. If you specify a user managed policy, then you also select one or more locations to replicate the secrets to.
If your Google Cloud organization has disabled global resource creation, then you must configure the deployment to use a user managed replication policy.
Previously, GCE deployments supported only the automatic replication policy.
Behavior Change
Review engine threshold settings in automated testing
With this release, a deployment defines maximum thresholds for the following engine resources: CPU load, memory used, and number of running pipelines. To use different thresholds for an individual engine, you enable an override for the engine. Previously, these thresholds were defined only at an engine level.
If your organization uses the StreamSets REST API or SDK to perform automated testing that sets engine-specific values for those resources, you can address these changes in the following ways:
  • If possible, update the deployment to use the thresholds defined for the engines.
  • If engines require different settings from the deployment, then configure your testing to override the deployment thresholds, as follows:
    • When using the REST API, add overrideEngineMaxes: true to the JSON payload for the /jobrunner/rest/v1/sdc/{sdcId}/updateSdcResourceThresholds endpoint.
    • When using the SDK, add override_engine_maxes=True as a parameter to the sch.update_engine_resource_thresholds() call.
Fixed Issues
  • When you export a draft pipeline or fragment, Control Hub does not remove plain text credentials configured directly in the pipeline or fragment.
  • A Control Hub Kubernetes deployment becomes stuck in an Activating state when you configure only the http.nonProxyHosts property for the engine proxy properties.

  • The Create Scheduled Task dialog box displays blank error messages.
  • When a deployment defines a percentage JVM memory strategy for engines, a Docker engine installation incorrectly allocates a percentage of the available memory on the host machine as the Java heap size, instead of the available memory in the Docker container.

June 2023

The following StreamSets platform releases occurred in June 2023.

June 14 and 21, 2023

This release includes several enhancements and fixed issues.

The release occurred in two phases across all Control Hub instances in the StreamSets platform. The first phase occurred on June 14, 2023, and the second phase occurred on June 21, 2023.

Enhancements
Welcome view
After you log in, Control Hub displays a Welcome view that lists your most recently edited pipelines and jobs so that you can quickly access them.
Quick Start
Click Quick Start in the top toolbar to quickly build a Data Collector or Transformer for Snowflake pipeline. A blank pipeline opens in the canvas so that you can immediately begin building the pipeline using all available stages. For a Data Collector pipeline, you must select or deploy an engine before you can preview, run, validate, or check in the pipeline.
Previously, you could use the Quick Start menu to quickly deploy a Data Collector or Transformer engine using Docker or as a shortcut to build any type of pipeline.
Edit Data Collector pipelines when engine is not accessible
When using Data Collector version 5.4.0 or later, you can edit pipelines and fragments in the pipeline canvas when the engine is not accessible. The engine must be accessible before you can preview, run, validate, or check in the pipeline or fragment.
Import pipelines and fragments
When you import a pipeline or fragment that uses connections, you can choose how to replace each connection based on whether the connection exists in the target organization:
  • Connection exists - You can choose to use the existing connection, create a placeholder connection, or remove the connection.
  • Connection does not exist - You can choose to create a placeholder connection or remove the connection.
Previously, if the connections existed, the imported pipeline or fragment automatically used the existing connections. If the connections did not exist, you could choose to import the pipeline or fragment without the connections or you could cancel the import.
Job templates
When you delete a job template that has attached job instances, Control Hub also deletes the attached job instances at the same time.
Previously, you could not delete a job template that had attached job instances. You first had to navigate to the Job Instances view and delete all attached job instances that were created from that template.
Control Hub-managed deployments
Control Hub-managed deployments include the following enhancements:
  • For Amazon EC2, Azure VM, and GCE deployments, the Engine Instances property has been renamed to Desired Instances.
  • For Amazon EC2, Azure VM, GCE, and Kubernetes deployments, you can set the Desired Instances property to 0 to temporarily prevent engine instances from running, as an alternative to stopping the deployment.

    Previously, the minimum number of instances was 1. You had to stop the deployment to temporarily prevent engine instances from running.

Groups
Groups require a unique display name in addition to a unique group ID. Previously, groups required only a unique group ID.
Alerts
When the alert text defined for a data SLA or pipeline alert exceeds 255 characters, Control Hub truncates the text to 244 characters and adds a [TRUNCATED] suffix to the text so that the triggered alert is visible when you click the Alerts icon in the top toolbar.
Previously, when the alert text exceeded 255 characters, Control Hub did not display the alert when you clicked the Alerts icon.
Fixed Issues
  • If you change the values of existing pipeline fragment parameters, publish a new version of the fragment, and then update a pipeline using that fragment to use the latest version, the update incorrectly changes the default values for any existing runtime parameters in the pipeline.
  • Fragment export fails when a fragment contains multiple stages of the same type that use different library versions.
  • When you select a stage in the canvas while monitoring a running Data Collector or Transformer pipeline, an “Unexpected error occurred” message temporarily displays.
  • When configuring a GCE deployment, you are required to select a service account even if the GCP environment has a default service account defined. Similarly, when configuring an Azure VM deployment, you are required to select a managed identity and resource group, even if the Azure environment has defaults defined for those objects.

May 2023

The following StreamSets platform releases occurred in May 2023.

May 12, 2023

This release includes several enhancements and a fixed issue.

Enhancements
Fragment icons
By default when you add a published fragment to a pipeline, the fragment displays as a single stage with a puzzle piece icon: . You can modify the icon to represent the fragment processing logic. For example, if a fragment merges two streams of data, you might configure the fragment to use the predefined Merge icon: .
You can select from a set of predefined icons, or you can upload an image to use as the icon.
Pipeline canvas toolbar
The new pipeline canvas UI displays the following icons in different colors, making it easier to locate these icons in the toolbar:
  • The Stop and Force Stop icons display in red: .
  • The Check In icon displays in red () when the pipeline or fragment has not passed implicit validation and cannot be checked in. The Check In icon displays in green () when the pipeline or fragment has passed implicit validation and is ready to be checked in.
Fragment export
You can export and import draft fragments.
GCE deployments
You can optionally configure a GCE deployment to provision Google Cloud VM instances without external IP addresses.
Configure the deployment to provision VM instances without external IP addresses only when your GCP project does not allow externally accessible IP addresses and your Google Cloud administrator has created a Cloud NAT gateway as a GCP environment prerequisite.
Previously, a GCE deployment always provisioned VM instances with external IP addresses.
Fixed Issue
  • If the last page of results for a job search previously returned items, but no longer does, no items are shown on the page. Users must navigate to the previous page to see items.

April 2023

The following StreamSets platform releases occurred in April 2023.

April 23, 2023

This release includes a fixed issue.

Fixed Issue
  • When a deployment uses an external resource archive and the externalResources folder contains an empty user-libs folder, engines belonging to the deployment fail to start with the following error message:

    Exception in thread "main" java.lang.IllegalArgumentException: Stage libraries directory '/<installation directory>/externalResources/user-libs' does not exist

April 21, 2023

This release includes several enhancements and fixed issues.

Enhancements
First login to StreamSets
When you log in to StreamSets for the first time and you do not have access to an engine deployed by another user, a blank Data Collector pipeline opens in the canvas. You can immediately begin building the pipeline using all available stages.
When you attempt to preview, publish, validate, or start a draft run of this first pipeline, Control Hub presents a simplified process to help you quickly deploy your first engine.
Pipeline canvas
When using the new pipeline canvas UI and you select a link connecting two stages, Control Hub displays a pop-up menu that includes the following icons:
  • Insert Stage () - Inserts another stage between the connected stages.
  • Delete () - Deletes the selected link.
Previously, the pop-up menu included only an Insert Stage icon.
Job templates
On the Job Templates view, the Start Jobs menu option has been renamed to Create Instances to more clearly indicate that you are creating and starting job instances from a job template.
Fixed Issues
  • Users and engines randomly encounter 401 authorization errors from Control Hub.
  • Editing an existing job template fails with the following error:

    'len' parameter must be a valid integer equal to -1 or greater than zero. len = '0' : RESTAPI_05

  • Searching for a pipeline by ID fails with the following error:

    "RSQL_00:Bean 'RPipeline' property 'id' does not have a mirrored property in 'PPipeline'"

  • The pipeline canvas incorrectly enables the Draft Run menu even though the selected authoring engine is not accessible.
  • When you modify or delete a parameter from a fragment being used in a pipeline, publish the fragment, and then update the pipeline to use the latest fragment version, the parameter changes are not reflected in the pipeline.
  • If you use advanced mode to edit the dnsPolicy attribute in a Control Hub Kubernetes YAML file, your modifications do not take effect. The deployment always uses the Default DNS policy.
  • If you specify a service account name for a Control Hub Kubernetes deployment, start the deployment, and then edit the active deployment to delete the service account name, the change does not take effect and the previous service account remains associated with the Kubernetes deployment.

March 2023

The following StreamSets platform release occurred in March 2023.

March 29, 2023

This release includes several enhancements and fixed issues.

Enhancements
Search
Search includes the following enhancements:
  • Control Hub includes additional preset searches for fragments, pipelines, job templates, job instances, and draft runs that are starred by default.
  • When searching for pipelines or fragments, you can use the Latest Version property to restrict searches to the latest version of each pipeline or fragment.
Pipeline and fragment preview
Previewing pipelines and fragments includes the following enhancements:
  • When you preview a pipeline or fragment, Control Hub uses the default preview configuration and no longer displays the Preview Configuration dialog box. While running preview, you can click the Preview Configuration icon () to change the configuration and then run the preview again.
  • When you preview a Transformer or Transformer for Snowflake pipeline or fragment, the data displays in table view by default.

    When you preview a Data Collector pipeline or fragment, the data continues to display in list view by default.

  • Table view displays only output data by default. You can optionally choose to display both input and output data.
  • Table view no longer displays colors for different types of data and changed data. List view continues to display colors.
  • While running preview, you can click the Expand icon () to quickly expand the preview panel to view more preview data.
Pipeline validation
If pipeline validation fails due to a timeout error, you can increase the validation timeout value.
Job templates
When you create job instances from a job template, you can specify whether the job instances inherit the permissions assigned to the parent job template.
Pipeline export
You can export and import draft pipelines.
Customize table columns
You can resize, hide, and reorder the table columns displayed in all views.
Azure VM deployments
The tracking URL for an active Azure VM deployment opens the overview page of the Azure deployment created for your StreamSets deployment. The deployment overview page enables you to more quickly access information about the Azure resources provisioned for the StreamSets deployment.
Previously, the tracking URL opened the overview page of the Azure virtual machine scale set.
Fixed Issues
  • Basic searches that use the not includes operator or advanced searches that use the =out= operator incorrectly return results that do include the specified values.
  • If you view the run history of a job and the pipeline version for one of the runs has been deleted, then the run history displays an error.
  • When a pipeline fails over to another Data Collector, the input and output records that display in the Summary tab and in the run summary for a selected job run accessed from the History tab do not include the records for all Data Collectors.
  • Control Hub allows you to set the CPU Threshold Percentage for a Control Hub Kubernetes deployment to a value between 0 and 100, but 0 is not a valid value in the Kubernetes cluster.

February 2023

The following StreamSets platform releases occurred in February 2023.

February 10, 2023

This release includes several new features, enhancements, behavior changes, and a fixed issue.

New Features and Enhancements
Kubernetes integration

Control Hub provides an integration with Kubernetes. When you use Control Hub Kubernetes environments and deployments, you launch a StreamSets Kubernetes agent that runs in your Kubernetes cluster. The agent communicates with Control Hub to provision the Kubernetes resources needed to run StreamSets engines and to deploy engine instances to those resources.

Search
Search is no longer a Technology Preview functionality and is now enabled by default. Search replaces filtering and is implemented for the following object types:
  • Fragments
  • Pipelines
  • Job templates
  • Job instances
  • Draft runs
StreamSets will implement search for additional object types in future releases. If needed, you can disable search and revert to the original filtering functionality.
Search includes the following additional enhancements:
  • Control Hub stores your saved searches in the backend rather than in your current browser. If you saved searches in a previous release, this enhancement has a behavior change.
  • You can star your most important saved searches so that you can easily find them later.
  • Control Hub includes several preset saved searches for pipelines that are starred by default. You can unstar these preset searches, but cannot edit or delete them. StreamSets will implement preset searches for additional object types in future releases.
  • You can search for draft runs that include a pipeline with a specified pipeline status or pipeline version.
Pipeline canvas
By default, Control Hub now displays the new pipeline canvas UI.
You can optionally switch to the classic UI while viewing a pipeline or fragment in the canvas.
Customize table columns
You can resize, hide, and reorder the table columns displayed in the Job Templates, Job Instances, and Draft Runs views.
In addition, you can reset table column widths to the default size after resizing a table.
Behavior Changes
Saved searches
With this release, Control Hub stores your saved searches in the backend. Previously, Control Hub stored saved searches in your current browser. Saved searches did not apply if you logged in using another browser, and saved searches were removed if you cleared the browser cache.
As a result of this change, you can no longer access previously saved searches.
In addition, if you previously configured Control Hub to persist the last search, Control Hub might unexpectedly display no found objects when you first access a view that supports search, such as the Pipelines view. To resolve this issue, perform another search or click the More icon () in the search section, and then click Clear All.
Job instances with no job runs
With this release, Control Hub automatically deletes inactive job instances older than 365 days that have never been run.
Previously, Control Hub indefinitely retained inactive job instances with no job runs.
Fixed Issue
  • When a pipeline fails over to another Data Collector, the input and output records that display in the job history do not include the records for the original Data Collector.

February 7, 2023

This release includes a fixed issue.

Fixed Issue
  • When SCIM provisioning is enabled and Microsoft Entra ID synchronizes users with expired Control Hub invitations, these users cannot log into Control Hub, even though the users have been correctly provisioned.

January 2023

The following StreamSets platform release occurred in January 2023.

January 20, 2023

This release includes several enhancements and fixed issues.

Enhancements
View and edit connection details from a published pipeline
When viewing a published pipeline that includes a stage using a connection, you can click the Edit Connection icon in the stage properties to view and edit the connection details.
Previously, only draft pipelines displayed the Edit Connection icon.
Customize table columns
You can resize, hide, and reorder the table columns displayed in the Connections view.
StreamSets will provide support to customize columns in additional views in future releases.
Job run history for job instances started from a job template
Control Hub now includes job instances started from a job template in the total count of job runs for the organization. As a result, Control Hub purges the run history for job instances started from a job template in the same way that it purges the run history for job and draft runs.
Previously, Control Hub indefinitely retained the run history for job instances started from a job template.
Search
The Technology Preview search functionality includes the following enhancements:
  • You can search for attached job instances created from job templates.
  • You can search for job instances that include a pipeline with a specified pipeline status.
  • You can search for job templates that have been archived.
  • In basic search, you can use auto-completion for the following search properties:
    • Label property for pipelines and fragments
    • Tag and Engine Label properties for job instances and job templates

    As you begin typing a value to search for, Control Hub displays a drop-down menu that lists values that match the entered characters.

Fixed Issues
  • Preview does not work when the Run Preview Through Stage property is set to a fragment.
  • The Engines view does not correctly sort the list of engines by the Memory Used column.
  • When using the new pipeline canvas UI while monitoring a job, the title of the job incorrectly displays the pipeline name instead of the job name.
  • If you create and publish a pipeline fragment using pipeline stages and you quickly enter a fragment prefix in the last step of the wizard, the newly created fragment is not added to the pipeline.
  • When using the Technology Preview search functionality and you sort pipeline search results by the Last Modified By column, the pipelines are incorrectly sorted by name.

2022

The following StreamSets platform releases occurred in 2022.

December 2022

The following StreamSets platform release occurred in December 2022.

December 16, 2022

This release includes several enhancements and fixed issues.

Enhancements
Pipeline fragments
When you create a pipeline fragment using pipeline stages and you choose to publish the new fragment, Control Hub displays the original pipeline in the canvas, automatically replacing the individual stages with the newly published fragment.
Previously, Control Hub displayed the original pipeline in the canvas with the individual stages.
Draft runs
While monitoring an engine in the Engines view, you can stop all draft runs currently running on that engine.
Search
The Technology Preview search functionality includes the following enhancements:
Subscriptions
You can configure a webhook action for a subscription to use one of the following additional authentication types to connect to the receiving system:
  • API Key
  • Bearer Token
  • OAuth 2.0
Fixed Issues
  • The Control Hub REST API incorrectly returns an HTTP 400 status code instead of a 404 status code when the specified environment or deployment does not exist.
  • When using the Technology Preview search functionality to find job templates or job instances last modified by a user, the search results incorrectly display the job created by that user.
  • When the Legacy Kubernetes integration is enabled, users cannot create a pipeline when they are assigned the Deployment Manager role but not the Provisioning Operator role or the Organization Administrator role.

November 2022

The following StreamSets platform release occurred in November 2022.

November 18, 2022

This release includes several enhancements and fixed issues.

Enhancements
Deployments
In the Deployments view, you can filter the list of displayed deployments by engine type.
Pipeline design
When configuring the conditions for a Stream Selector processor in a Data Collector pipeline, you can reorder the output conditions.
Search
The Technology Preview search functionality includes the following enhancements:
  • You can search by user email address to find all pipelines or fragments committed or last modified by the user or to find all job templates or job instances created or last modified by the user.
  • When using basic mode to search for job templates or job instances, you can include or exclude the v prefix. For example, you can search for both v1 or 1 when defining the Pipeline Version property.

    Previously in basic mode, you had to include the prefix. For example, you had to search for v1.

Fixed Issues
  • When you expand or collapse a fragment in the job monitoring UI, Control Hub does not update the metric diagrams to display statistics for individual stages in the expanded fragment or for the single fragment stage.
  • When the Enable WebSocket Tunneling for UI Communication property is enabled for an organization, all users should be able to configure the Browser to Engine Communication type; however, only organization administrators can do so.
  • When using the Technology Preview search functionality, searching for job instances with an INACTIVE job status excludes job instances that have never run.
  • When using the Technology Preview search functionality and you sort the search results by job instance status, job instances that have never run are shown first when sorting by ascending status, and last when sorting by descending status.
  • When you add a fragment that uses parameters to a pipeline and define no parameter name prefix, Control Hub silently fails to add the fragment to the pipeline.
  • When you configure an Azure VM deployment, Control Hub does not display existing SSH key pairs in the Key Pair Name property because the documentation to create an Azure AD custom role for the parent environment does not include the following required permission:

    Microsoft.Compute/sshPublicKeys/read

    Important: StreamSets strongly recommends updating the custom role for all existing Azure environments, as described in the Azure environment prerequisites.

October 2022

The following StreamSets platform releases occurred in October 2022.

October 28, 2022

This release includes a new feature.

New Feature
Provisioning of user accounts from Microsoft Entra ID
After enabling SAML authentication using Microsoft Entra ID (previously known as Azure AD), you can configure the provisioning of user accounts from Entra ID to StreamSets. When a user or group is created, updated, or deleted in Entra ID, the same changes are automatically made in StreamSets.
StreamSets supports System for Cross-domain Identity Management (SCIM) 2.0 for the provisioning of user accounts from an identity provider (IdP).
StreamSets will provide SCIM provisioning support for additional IdPs in future releases.

October 26, 2022

This release includes several enhancements and fixed issues.

Enhancements
Supported browsers
StreamSets Control Hub now supports the latest version of Microsoft Edge as a browser.
Search
The Technology Preview search functionality includes the following enhancements:
  • For job instance and job template search, the Pipeline Commit Label property has been renamed to Pipeline Version to more clearly indicate that the property specifies the version of the pipeline included in a job instance or job template.
  • Basic search for job templates no longer displays the Status property because job templates do not have a status.
Proxy server configuration for engines

When you set up a deployment that configures a Data Collector or Transformer engine to use a proxy server for outbound network requests, you can now include special characters in the proxy user and password values, except for an exclamation point (!), a backward slash (\), a leading number sign (#), or a leading or trailing space.

Fixed Issues
  • The Test Connection button in the connection wizard incorrectly requires a user to have the Deployment Manager role.
  • Creating a pipeline incorrectly requires a user to have the Deployment Manager role.
  • The Fields to Convert property in the Field Type Converter processor does not allow an expression that includes a comma.
  • If you configure an engine to use a proxy server and you include a space, single quotation mark, or double quotation mark in the defined proxy properties, then either the engine installation script fails or the proxy authentication fails.
  • Topology data SLAs do not trigger alerts because they do not correctly retrieve data.

October 12, 2022

This release includes a fixed issue.

Fixed Issue
  • When you delete multiple users at the same time, a user synchronization error message displays even though the users are successfully deleted.

October 6, 2022

This release includes a fixed issue.

Fixed Issue
  • When SAML authentication is enabled, only users with the Organization Administrator role can log in.

October 5, 2022

This release includes new features, enhancements, a behavior change, and fixed issues.

New Features and Enhancements
Search
You can perform both basic and advanced searches to find specific pipelines, fragments, job instances, or job templates. With basic search, you define search conditions by selecting the object properties, operators, and values that you want to search for.
With advanced search, you define search conditions by writing a search query using the StreamSets advanced query language (SAQL). Advanced search allows you to specify criteria that cannot be defined in basic search, such as defining complex conditions or changing the order of precedence for multiple conditions.
Search replaces filtering and is a Technology Preview functionality. To try the functionality, you must enable search in your browser settings. StreamSets will implement search for additional object types in future releases.
Copy field value in preview
You can quickly copy a field value from pipeline preview using the Copy Field Value to Clipboard icon: .
Pipeline preview displays the new icon next to the existing Copy Field Path to Clipboard icon: . The icons display for each field in the previewed records, as follows:
Previously, pipeline preview displayed only a Copy Field Path to Clipboard icon for each field.
Pipeline version history
When using the new pipeline canvas UI, the pipeline or fragment version history that opens from the canvas displays all information on the initial panel. You do not need to expand the version history to manage the versions including viewing commit messages, comparing versions, creating tags for versions, and deleting versions.
In addition, the new version history includes a link to each version and displays the minimum engine version that the pipeline or fragment version can run on, as follows:
Import pipelines and fragments without connections
When you import pipelines or fragments that use connections and those connections do not exist in the target organization, you can choose to import the objects without connections. After the import, you must edit the pipelines or fragments to define the connections.
Previously, to import pipelines or fragments that use connections, the connections had to exist in the target organization.
Subscription parameters
You can use the ERROR_MESSAGE parameter for a subscription triggered by a maximum global failover retries exhausted event.
Engine resource thresholds
The default value of the Max Memory threshold for an engine is now 100%. Previously, the default value was 80%. Existing engines retain the currently configured threshold.
Behavior Change

With this release, the sample IAM policy for credentials provided by a Control Hub AWS environment includes the following additional permission:

autoscaling:DescribeWarmPool

This permission is required when you update the number of engine instances for an active Amazon EC2 deployment belonging to an AWS environment.

If the IAM policy does not include this permission and you update the number of engine instances, the deployment transitions to an Activation Error state. When you access the tracking URL to the AWS Management Console, the Events tab displays the following error:

API: autoscaling:DescribeWarmPool User: arn:aws:sts::${ACCOUNT_ID}:assumed-role/${CROSS_ACCOUNT_ROLE}/STREAMSETS_SCH is not authorized to perform: autoscaling:DescribeWarmPool because no identity-based policy allows the autoscaling:DescribeWarmPool action

StreamSets strongly recommends updating the IAM policy for credentials for all existing StreamSets AWS environments to avoid this deployment error.

Log into the AWS Management Console and update the IAM policy for credentials created for each StreamSets AWS environment. Add the permission in bold to the following section of the IAM policy:
...
{
    "Sid": "0",
    "Effect": "Allow",
    "Action": [
        "ec2:DescribeImages",
        "autoscaling:DescribeScalingActivities",
        "ec2:DescribeVpcs",
         "autoscaling:DescribeAutoScalingGroups",
        "ec2:DescribeRegions",
        "autoscaling:DescribeLaunchConfigurations",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeInstanceTypeOfferings",
        "ec2:DescribeSubnets",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeInstances",
        "autoscaling:DescribeScheduledActions”,
        "autoscaling:DescribeWarmPool"
        ],
    "Resource": "*"
},
...

You do not need to restart active StreamSets AWS environments or Amazon EC2 deployments after updating the policy. However, if an existing Amazon EC2 deployment has transitioned to an Activation Error state due to this error, then you must stop that deployment and start it again.

For more information about the required IAM policy, see Configure AWS Credentials.

Fixed Issues
  • The Topologies Dashboard can display an incorrect number of engines.
  • Pipelines that contain fragments with names starting with a number do not correctly display in the new pipeline canvas UI and encounter unexpected errors in the classic pipeline UI.
  • Stages that were originally in a fragment could become uneditable.

August 2022

The following StreamSets platform release occurred in August 2022.

August 26, 2022

This release includes several enhancements, a behavior change, and fixed issues.

Enhancements
Scroll zoom in pipeline canvas
You can now zoom in or out of the pipeline canvas using the mouse scroll wheel or using the trackpad. By default, scroll zoom is enabled. You can disable scroll zoom if needed.
Hide or show contextual help in wizards
When you create or edit an object, such as an environment, deployment, pipeline, or job, you can now hide or show the contextual help that displays on the right of the wizard.
Behavior Change
Filters for job templates, job instances, or draft runs
With this release, when you select the Keep Filter Persistent checkbox to retain the filter on the Job Templates, Job Instances, or Draft Runs view, Control Hub retains a unique filter for each view. Previously, Control Hub incorrectly used the same filter for all of these views.
As a result, when you first access the Job Templates, Job Instances, or Draft Runs view, no filter is applied, even if you previously persisted the filters.
Fixed Issues
  • In rare cases when editing a deployment, the External Resource Source property can be set to null which causes a null pointer exception.
  • The TRIGGERED_COUNT and TRIGGERED_ON subscription parameters do not contain the correct values.
  • The Windowing Aggregator processor does not display aggregation charts.
  • You can delete a job associated with a running scheduled task, resulting in a scheduled task that attempts to trigger actions on a job that doesn’t exist.
  • In rare cases, the configured time zone for a scheduled task is ignored, which causes the task to start and finish at undesired times.
  • Clicking Upload Offset & Start while viewing job instance details uploads the offset but doesn’t start the job.
  • Because the initialization script for an Azure VM deployment does not run before engine instances start, you cannot use the script to update DNS entries. As a result, engines fail to launch when the Azure VNet uses custom DNS servers.

July 2022

The following StreamSets platform release occurred in July 2022.

July 20, 2022

This release includes several enhancements and fixed issues.

Enhancements
Runtime parameters for pipelines
Some pipeline and stage properties conditionally display child properties. For example, if you configure an origin to use the Delimited data format, the origin displays a set of Delimited configuration properties. If you configure that origin to use the JSON data format, it displays a different set of JSON configuration properties.
However, if you use a runtime parameter to define a parent property, all child properties now display so that you can configure valid values for all dependent properties.
For example, if you convert the Data Format drop-down menu to a text box and then call the dataformat parameter using the required syntax, the origin displays all of the Delimited, JSON, and Text configuration properties.
Job monitoring toolbar
When you monitor a job, the canvas includes a new toolbar that more clearly indicates the action that each toolbar icon completes.
You can switch between the new and classic UI while monitoring a job.
Starting multiple scheduled jobs at the same time
When you start multiple jobs at the exact same time using the scheduler, the number of pipelines running on an engine can exceed the Max Running Pipeline Count configured for the engine.
If exceeding the resource threshold is not acceptable, you can enable an organization property that synchronizes the start of multiple scheduled jobs. However, be aware that enabling the property can cause scheduled jobs to take longer to start.
Fixed Issues
  • When sharing an object, you cannot search for user and group names that include the search string, only names that start with the search string.
  • If you rename a job, scheduled tasks for that job do not update the job name.
  • When you delete users or groups and then create users or groups with the same ID, permissions might behave unexpectedly.

June 2022

The following StreamSets platform releases occurred in June 2022.

June 17, 2022

This release includes an enhancement and several fixed issues.

Enhancement
New stages centered in the canvas
When you use the stage library panel to add a new stage to the pipeline canvas, the stage appears in the center of the current view of the canvas. Previously, it appeared to the right of the right-most stage in the pipeline.
Fixed Issues
  • When a deployment contains multiple engine instances and you try to share the deployment with another user or group for a second time, Control Hub fails to update the permissions due to a 500 internal server error.
  • A single faulty subscription prevents other suitable subscriptions from triggering.
  • You cannot access a draft run of a pipeline when you view the list of running pipelines on an engine from the Engines view.

June 3, 2022

This release includes an enhancement and several fixed issues.

Enhancement
View and edit connection details from stage properties
When configuring a pipeline or fragment stage that uses a connection, you can click the Edit Connection icon in the stage properties to view and edit the connection details.
Fixed Issues
  • The Available Pipeline Runners histogram does not display when you monitor a job for a Data Collector multithreaded pipeline.
  • The pipeline canvas cannot handle more than 100 instances of the same stage.
  • Provisioning Agents are incorrectly included in the system limit for engines.

May 2022

The following StreamSets platform releases occurred in May 2022.

May 20, 2022

This release includes several enhancements and fixed issues.

Enhancements
Compare pipeline and fragment versions
When using the new pipeline canvas UI to compare two versions of a pipeline or pipeline fragment, you can click the Open in Canvas icon () to open one of the versions in the pipeline canvas.
Engine type icon in Fragments, Pipelines, and Sample Pipelines views
The Fragments view, Pipelines view, and Sample Pipelines view include an engine type icon next to each pipeline or fragment name so that you can quickly determine the pipeline or fragment type.
The following table lists each engine type icon:
Icon Engine Type
Data Collector
Transformer
Transformer for Snowflake
Fixed Issues
  • Credential fields in a pipeline state notification webhook do not properly show the value of entered passwords when authentication is used.
  • On Windows 10, some file archive programs cannot extract an exported Control Hub ZIP file because of the colon (:) character in the JSON file names.

May 6, 2022

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Draft runs
Pipeline test runs have been renamed to draft runs to more clearly indicate that a draft run is the execution of a draft pipeline.
Control Hub includes a new Run > Draft Runs view that you use to manage all draft runs that you have access to. You can view active and inactive draft runs, view the draft run history for a pipeline, and can stop or delete draft runs as needed.
Previously, you could only manage active draft runs from the pipeline canvas.
Stage selector in the pipeline canvas
The new pipeline canvas UI includes an enhanced stage selector with a horizontal layout, making it easier to filter the stages by type.
In addition, when you add a stage with an open link, the pipeline canvas displays a dotted link connected to the Add Stage icon, clearly indicating that you need to add another stage:
When you click the Add Stage icon, the stage selector opens, allowing you to search for and select the next stage to add:
Fixed Issues
  • When authorizing Control Hub API calls, Control Hub does not consider the roles assigned by a user’s group.
  • When you change the owner of a scheduled task, the previous owner is incorrectly listed as the user that executes the task.

April 2022

The following StreamSets platform releases occurred in April 2022.

April 27, 2022

This release includes a fixed issue.

Fixed Issue
  • You cannot create new users using the Control Hub REST API.

April 22, 2022

This release includes several new features and enhancements.

New Features and Enhancements
Init scripts for cloud service provider deployments
You can define an initialization script for cloud service provider deployments, such as Amazon EC2, Azure VM, and GCE deployments. Control Hub runs the init script while provisioning a new instance in your cloud account.
Use the script to set up provisioned instances with additional software as required by your organization. For example, you might use an init script to install required certificates or packages on each provisioned instance.
Pipeline canvas toolbar
The pipeline canvas includes a new toolbar that more clearly indicates the action that each toolbar icon completes. By default, the pipeline canvas continues to display the classic toolbar.
You can switch between the new and classic UI while viewing a pipeline or fragment in the canvas.
Job status
Jobs that have not run are listed with an Inactive status. Previously, jobs that had not run were listed without a status.
Editable group ID
When you create a group, you define the group display name and can optionally edit the group ID. By default, Control Hub generates the group ID from the display name, using all lowercase characters and replacing any spaces with underscores. Users type the group ID when using credential functions in pipelines. As a result, you might want to edit the default group ID to make it easier to use with credential functions.
Previously when you created a group, you defined only the group display name. Control Hub used a randomly generated ID for the group ID which was not editable. When using credential functions, you had to type the randomly generated ID. Groups created before this release retain their existing group IDs.

April 13, 2022

This release includes a fixed issue.

Fixed Issue
  • You cannot directly log in to the SAML configuration page as an Organization Administrator if you belong to another organization with SAML authentication disabled.

April 8, 2022

This release includes several enhancements and fixed issues.

Enhancements
Simplified pipeline and fragment wizards

The pipeline wizard and the fragment wizard have been simplified and no longer include the redundant Review & Open step.

Upgrading a pipeline to use a later authoring engine version
When you select a later authoring engine version for a pipeline, Control Hub now informs you that you are upgrading the pipeline so that it can no longer run on the earlier engine version. If you select a later engine version for a draft pipeline, you are given the choice to publish the draft pipeline first, and then create a new draft pipeline that is upgraded to run on the later engine version. That way, you retain a pipeline version that can run on the earlier engine version.
Previously, when you selected a later engine version for a pipeline, Control Hub automatically upgraded the pipeline without the upgrade warning.
Fixed Issues
  • When you disable a deployment and then access a pipeline using an authoring engine from that deployment, you are logged out.
  • Export All Published Pipelines exports only some published pipelines.
  • Pressing Ctrl+Z doesn't undo text changes after you edit text in the Control Hub UI.
  • Exporting jobs can generate a corrupt ZIP file.

March 2022

The following StreamSets platform release occurred in March 2022.

March 18, 2022

This release includes a new feature and several fixed issues.

New Feature
Transformer for Snowflake available in public preview
New and existing organizations can create and run Transformer for Snowflake pipelines.
Transformer for Snowflake is a component of the StreamSets platform that enables processing Snowflake data using Snowflake's Snowpark client libraries.
Transformer for Snowflake provides a user interface that allows designing and performing complex processing in Snowflake without having to write SQL queries or templates. It also provides easy access to Snowpark DataFrame-based processing so you can use Snowpark without having to set it up in your Snowflake account.
Important: While in public preview, do not use Transformer for Snowflake for production workloads. For help with Transformer for Snowflake, join the StreamSets Community and use the Transformer for Snowflake tag when you post your question. To provide suggestions and feedback, add a new idea at https://ibm-data-and-ai.ideas.ibm.com/?project=SSETS.
For more information about Transformer for Snowflake, see the Transformer for Snowflake documentation.
Fixed Issues
  • When you stop an Azure VM deployment, the Azure key vaults created to store secrets for the provisioned VM instances are not deleted from your Azure account.

    If you added Azure tags to Azure VM deployments that were stopped before this fix, you can use the tags to find the key vaults and then delete them from your Azure account.

  • Subscriptions do not trigger when permission enforcement is disabled for the organization.

February 2022

The following StreamSets platform releases occurred in February 2022.

February 25, 2022

This release includes several enhancements and fixed issues.

Enhancements
Install stage libraries while creating connections
When using an authoring Data Collector 4.4.x or later, you can select from all possible connection types while creating a connection, even if the corresponding stage library is not installed on the authoring Data Collector. If you select an uninstalled connection type, you can choose to update the deployment with the missing stage library that includes the connection type.
​​SDC_ID subscription parameter displays as Engine ID in the user interface

When you create a subscription and define a simple condition for an execution engine not responding event, the SDC_ID parameter now displays as Engine ID in the UI, instead of SDC_ID. The Engine ID label indicates that the parameter can include the ID of any engine type.

Continue to use SDC_ID as the parameter name when you use the parameter to represent StreamSets engines.

Scheduled tasks
While creating a scheduled task, you can search for the job to schedule.
Fixed Issues
  • Exporting multiple pipelines can generate a corrupt ZIP file.
  • You cannot download a snapshot captured for a pipeline test run or job run.
  • When you edit a subscription condition in the UI, the changes are not saved.
  • The job History tab inconsistently displays Data Collector URLs.

February 23, 2022

This release includes an enhancement and a fixed issue.

Enhancement
Supported SAML identity providers
StreamSets supports PingFederate as a SAML identity provider.
Fixed Issue
  • Control Hub allows you to test a SAML draft configuration when SP-initiated logins are disabled, even though testing SAML from Control Hub is only supported when SP-initiated logins are enabled. Testing when SP-initiated logins are disabled results in a 404 Not Found error.

February 11, 2022

This release includes several enhancements and fixed issues.

Enhancements
Create fragments using pipeline stages
When you create a fragment from selected stages in a pipeline, you can choose to immediately publish the fragment as long as the fragment meets the validation requirements to be published. Or, you can choose to create a draft fragment and then continue designing the pipeline.
If you immediately publish the fragment, you can then replace the individual stages in the pipeline with the newly published fragment.
GCE deployments support selecting a subnet
GCP environments support VPC networks that have an auto or custom subnet creation mode. As a result, while creating a GCE deployment, you must select the subnet within the VPC network to provision the Compute Engine instances in.
Previously, GCP environments only supported auto mode VPC networks, where the VPC automatically creates one subnet from each region. You did not have to select a subnet while creating a GCE deployment because the VPC automatically selected a suitable subnet.
Contextual help sidebar
After selecting a view in the Navigation panel, click the Help icon () above the listed objects to open the contextual help sidebar. To close the contextual help, click outside of the sidebar. The contextual help sidebar has been implemented for most views. It will be implemented for additional views in future releases.
Use the contextual help to access common getting started, usage, troubleshooting, and best practice help topics and StreamSets Learning Academy videos. You can also search for topics in the StreamSets documentation.
The help links that display on each tab depend on the selected view. For example, the following image displays the getting started help topics for the Pipelines view:

Contextual help in environment, deployment, and pipeline wizards
When you create or edit an environment, deployment, or pipeline, the wizard displays contextual help on the right. The help contents depend on the step being completed in the wizard. Use the scroll bar to view the complete help contents for each step.
For example, the following image displays the contextual help provided in the first step of the pipeline wizard:

Fixed Issues
  • Restarting multiple engines at the same time causes an error.
  • When you select a later version of an authoring Data Collector while a pipeline is in read-only mode, stage libraries are not updated in the pipeline.
  • The Global Failover Retries property does not display when creating or editing a Transformer job.
  • GCP environments support only VPC networks that have automatic subnet creation enabled.
  • GCP environments do not support using a shared VPC network from a host GCP project.

February 4, 2022

This release includes an enhancement and fixed issue.

Enhancement
Encrypting SAML assertions
When you disable SP-initiated logins for SAML authentication, you can optionally configure the IdP to encrypt the SAML assertion. If you choose not to configure the IdP to encrypt SAML assertions, then you must disable the Require Encryption on Assertion property in the draft SAML configuration for your organization.
Previously, when you disabled SP-initiated logins, you had to configure the IdP to encrypt the SAML assertion.
Fixed Issue
  • If you belong to multiple organizations that have SAML authentication enabled and you sign in to StreamSets using SSO SAML, you cannot choose which organization to log into.

January 2022

The following StreamSets platform releases occurred in January 2022.

January 31, 2022

This release includes several enhancements and fixed issues.

Enhancements
Create connections in the pipeline canvas
While building a pipeline, you can create a new connection for a stage without leaving the pipeline canvas.
Connection details
The Connection details page has been enhanced to make the available actions more visible and to more clearly display details about all pipeline and fragment versions using the connection.
Import existing pipelines or fragments as new
When you import a pipeline or fragment that already exists in the target organization, you can choose to import the object as a new pipeline or fragment.
Contextual help in the pipeline canvas
While building a pipeline, click the Help tab in the pipeline properties panel to access common getting started, pipeline usage, troubleshooting, and best practice help topics and StreamSets Learning Academy videos. You can also search for topics in the StreamSets documentation.
The help links that display on each tab depend on the pipeline type and the stages added to the pipeline. For example, the following image displays the getting started help topics for a Data Collector pipeline:

Fixed Issues
  • When viewing Transformer engines in the Engines view, clicking the More icon and then clicking Transformer Components displays the Getting Started page.
  • Pipeline export fails when a pipeline contains multiple stages of the same type that use different library versions.
  • You cannot download an engine support bundle.

January 21, 2022

This release includes a behavior change.

Behavior Change
Transformer version for new deployments
Starting with Transformer version 4.2.0 released on January 21, 2022, you can create a new deployment only for Transformer 4.2.0 or later. Earlier Transformer versions are not supported in new deployments.
Existing deployments can continue to use the earlier Transformer versions.

January 14, 2022

This release includes several enhancements, behavior changes, and fixed issues.

Enhancements
Duplicate connections
You can duplicate a connection to create a copy of an existing connection. You can then change the configuration of the copy.
Share pipeline or fragment while publishing
While using the Check In wizard to publish a pipeline or publish a fragment, you can share the pipeline or fragment with other users and groups.
Behavior Changes
Data Collector version for new deployments
Starting with Data Collector version 4.3.0 released on January 13, 2022, you can create a new deployment only for Data Collector 4.3.0 or later. Earlier Data Collector versions are not supported in new deployments.
Existing deployments can continue to use the earlier Data Collector versions.
Fixed Issues
  • A running job becomes corrupted if you import another version of the job that uses a newer pipeline version.
  • A subscription for a job status change event with a JOB_OWNER condition set to a specific email address fails to trigger because the JOB_OWNER parameter is always null.

2021

The following StreamSets platform releases occurred in 2021.

December 2021

The following StreamSets platform releases occurred in December 2021.

December 20, 2021

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Job templates
Job templates include the following enhancements:
  • Job templates display on the Job Templates view, separate from the list of job instances that display on the Job Instances view.
  • When you create job instances from a job template, the instances are attached to the parent job template by default. Attached job instances display in the parent job template’s run history and are updated when the parent job template is updated.

    You can optionally detach job instances from the parent job template when you want to use the job details and default parameter values defined in the template, but don't want subsequent changes to the template to be applied to the job instance.

  • When you create a job template for a pipeline that uses runtime parameters, you define whether each parameter functions as a dynamic parameter that can be overridden in child job instances or as a static parameter that cannot be overridden.
  • You can archive a job template when you do not want new job instances to be created from the template, but want existing job instances to continue to run.
  • Job instances created from a job template inherit all tags added to the template.
Stage library selection for deployments
When you define the engine configuration for a deployment, you can select individual stages to install or uninstall, in addition to selecting stage libraries.
When you select individual stages, you can:
  • Filter the list of available stages by type or search for stages by name.
  • Select the stage library version to install, not just the latest version.
  • View the complete list of stages included in each stage library.
In addition, the stage library selection window includes a summary view of the currently selected stage libraries by file name. For example, the Basic stage library is listed as streamsets-datacollector-basic-lib.
Engine installation script for self-managed deployments
When using a self-managed deployment, you can configure the engine installation script to run the engine as a foreground or background process.
Previously, a tarball installation script always ran the engine as a foreground process. A Docker image installation script always ran the engine as a background process.
Authoring engine selection for pipelines, fragments, and connections
When you create pipelines, fragments, or connections, the authoring engine selection window displays the current CPU and memory usage of each engine.
The resource usage can help you troubleshoot issues with inaccessible engines. For example, when an engine uses an excessive amount of available resources, the machine running the engine might lose connection to Control Hub or the browser.
Sticky notes for pipeline canvas
To delete a sticky note from the pipeline canvas, you click the Delete icon () instead of the X in the note header.
Fixed Issues
  • When legacy Kubernetes is enabled for your organization and you have created legacy deployments only, you receive the following message when trying to create a pipeline:

    Attention: You need to set up a deployment with an engine before you can create a pipeline

  • If you have an existing Data Collector deployment with an installed enterprise stage library, and you edit the deployment to upgrade to a new version of the same enterprise stage library, you cannot remove the existing version of the stage library. As a result, Data Collector fails to start with the following error:

    REST_1001 - Unable to find following stage libraries in repository list: streamsets-datacollector-<enterprise library type>-lib:<new version>, streamsets-datacollector-<enterprise library type>-lib:<existing version>

  • If existing Control Hub users have an Invited or Expired status and then you enable and disable SAML authentication, disabling SAML fails with the following error:

    javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: could not execute statement

November 2021

The following StreamSets platform releases occurred in November 2021.

November 17, 2021

This release includes a new feature.

New Feature
SAML authentication
The StreamSets platform supports single sign-on (SSO) authentication with SAML 2.0 with selected identity providers (IdPs).
Enabling SAML authentication requires registering StreamSets as a service provider in one of the supported IdPs and configuring SAML authentication for your Control Hub organization. Once enabled, all users in the organization must log in with SAML authentication.
At this time, StreamSets supports the following SAML identity providers:
  • Microsoft Active Directory Federation Services (AD FS)
  • Okta

November 5, 2021

This release includes a new feature and several fixed issues.

New Feature
Legacy Kubernetes integration
Control Hub provides a legacy Kubernetes integration that you can use to automatically provision Data Collectors on Kubernetes. Provisioning includes deploying, registering, starting, scaling, and stopping Data Collector Docker containers in a Kubernetes cluster. Legacy Kubernetes integration requires that the Provisioning Agent use the Control Agent Docker image version 4.0.0 or later.
Legacy Kubernetes integration is enabled only for some accounts.
Fixed Issues
  • When viewing real-time statistics and metrics for an active Transformer job, the names of stages display incorrectly.
  • Jobs can transition to an inactive_error status with the following error because multiple Job Runner applications attempt to process the same job:

    JOBRUNNER_69 At least one of the execution engines <engine ID> didn’t respond to the stop command

October 2021

The following StreamSets platform releases occurred in October 2021.

October 22, 2021

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Microsoft Azure integration
Control Hub provides an integration with your Microsoft Azure account. When you use Azure environments and Azure VM deployments, Control Hub automatically provisions Azure VM instances needed to run StreamSets engines in your Azure account, and then deploys engine instances to those VM instances.
JVM memory strategy for deployed engines
By default, engines are now configured to use 50 percent of the available memory on the host machine as the Java heap size. Previously, engines were configured to use an absolute value, 1024 MB by default.
You can edit the Java heap size percentage or set an absolute value by modifying the engine advanced configuration properties for a deployment. However, in most cases, the default percentage value is sufficient.
Quick Start
When you create self-managed deployments from the Quick Start menu, Control Hub assigns the deployments a quick-start tag and names the deployments as follows, based on the selected engine type:
  • Docker Data Collector <number> (Quick Start)
  • Docker Transformer <number> (Quick Start)
For example, if you create two deployments of Data Collector from the Quick Start menu, Control Hub names the deployments Docker Data Collector 1 (Quick Start) and Docker Data Collector 2 (Quick Start).
Previously, all deployments created from the Quick Start menu were named local.
Job error acknowledgement
By default when a job encounters an inactive error status, users must acknowledge the error message before the job can be restarted. You can configure a job to skip job error acknowledgements. You might want to skip job error acknowledgement for scheduled jobs so the job can automatically be restarted without requiring user intervention. However, be aware that skipping job error acknowledgement might hide errors that the job has encountered.
Upgrade to a paid subscription
If you have a Free account, you can upgrade to a paid subscription from the My Account menu.
Fixed Issues
  • The pipeline canvas incorrectly allows adding a fragment to another fragment although nested fragments are not supported.
  • After reloading a pipeline, parameters called from properties that display as checkboxes are not displayed.
  • Control Hub does not allow the at sign (@) in the property name for any engine advanced configuration properties.

September 2021

The following StreamSets platform releases occurred in September 2021.

September 28, 2021

This release includes a fixed issue.

Fixed Issue
  • Pipeline fragment creation is failing.

September 15, 2021

This release includes several enhancements.

Enhancements
In-application chat
You can use the in-application chat to start a conversation with the StreamSets team. You can enable or disable the chat in your account settings accessible from the My Account window.
Sticky notes for pipeline canvas
You can add sticky notes to the pipeline canvas to include notes that might be useful to you or your team as you build a pipeline or pipeline fragment. For example, you might add a note as a reminder to revisit a portion of the pipeline design or to change a stage after running initial tests of the pipeline.
Updated usage limits
The usage limits for organizations with a Free account have been updated as follows:
  • 2 users - For an existing Free account with more than 2 users, all users can continue to log in as a member of the organization. However, you cannot invite additional users to the organization.
  • 10 published pipelines - For an existing Free account with more than 10 published pipelines, you can continue to use all published pipelines.
The usage limit of 2 active jobs remains unchanged.

August 2021

The following StreamSets platform releases occurred in August 2021.

August 11, 2021

This release includes several enhancements.

Enhancements
My Account
All properties previously available in the User Settings window have been moved to the My Account window. As a result, you can manage all your account settings from a single window. You can also modify your display name in the My Account window.
Confirmation dialog boxes
The OK button in several confirmation dialog boxes has been renamed to reflect the action that you are taking. For example, when deleting a connection, the OK button has been renamed to Delete.
Roles and permissions required to delete an engine
A user must have the following roles and permissions to delete an engine:
Task Roles Permissions
Delete an engine. Engine Administrator

Deployment Manager

Write on engine
Previously, a user required the following roles and permissions to delete an engine:
Task Roles Permissions
Delete an engine. Engine Administrator

Organization Administrator

Execute on engine
This change does not affect role and permission assignments for existing users. A user can still delete an engine with the previous role and permission requirements.
For the complete list of required roles and permissions to complete common engine tasks, see Engine Tasks.

July 2021

The following StreamSets platform releases occurred in July 2021.

July 30, 2021

This release includes several new features, enhancements, behavior changes, and fixed issues.

New Features and Enhancements
Deployment and engine permissions
With this release, engines always inherit the same permissions assigned to the deployment. Those inherited permissions cannot be modified at the engine level. For example, if you grant a user Execute permission on a deployment, that user also has Execute permission on all engines managed by that deployment.
Previously, you could modify permissions at the engine level. As a result, users could have different permissions on engines managed by the same deployment.
If you previously modified engine permissions, those changes are now overridden with the deployment permissions. Check that users' access levels to engines have not unexpectedly increased or decreased, and modify the deployment permissions as needed.
Engines
  • Shut down engines - You can shut down engines belonging to a self-managed deployment from the Engines view. You can also restart or shut down multiple engines at the same time.
  • Resource thresholds - The default values of the Max CPU Load and Max Memory thresholds for an engine are now 80%. Previously, the default values were 100%.
Connections
When creating or editing a connection, you can filter the list of available authoring Data Collectors by deployment, version, or label. The Data Collector with the most recent reported time is selected by default.
Pipelines
  • Pipeline and fragment creation - While creating a new pipeline or pipeline fragment, you can choose to open the pipeline or fragment in the canvas before completing the Share Pipeline or Share Pipeline Fragment step.
  • Sample pipelines - When you open a sample pipeline in the pipeline canvas, the Duplicate button has been renamed to Create a pipeline from sample.
Free and paid accounts
The StreamSets platform now offers paid accounts in addition to free accounts.
With a free account, your use is subject to a number of limits. For legal related limits - including those related to warranty, support, and SLA - please see Section 17 of the Terms of Use. For usage related limits, such as organization limits, see Organization Default System Limits. A free account is not limited to internal evaluation purposes, but may be terminated by either party at any time.
With a paid account, you can increase the organization limits. For more information about paid accounts, contact your StreamSets account team.
Users
  • Session timeout - After 30 minutes of inactivity, user sessions expire and users are logged out. An organization administrator can change the inactivity period for the organization.
  • Deleting users - When you attempt to delete an active user, Control Hub prompts you to deactivate the user during the deletion process.
Behavior Changes
Data Collector version for new deployments
With this release, you can create a new deployment only for Data Collector version 4.0.2 or later. Earlier Data Collector versions are not supported in new deployments.
Existing deployments can continue to use the earlier Data Collector versions.
Fixed Issues
  • If you stop multiple deployments at the same time, Control Hub might fail to completely stop the deployments with the following error:
    An AppException occurred deactivating deployment tokens after stack was removed:
    Issues: [CSP_046 - Rest Api '<Control Hub URL>/security/rest/v1/organization/<organization ID>/components/delete' failed with status:'500']
  • No errors display in the UI when you restart Docker engine instances for a self-managed deployment with an external resource archive that uses an incorrect file format.

July 14, 2021 (beta)

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Quick Start
To get started with StreamSets, click Quick Start in the top toolbar to quickly deploy a Data Collector or Transformer engine using Docker. After the engine is launched and running, click Quick Start > Create a pipeline to quickly create a pipeline.
Environments
  • Environment state - The environment state and status have been combined into a single state to simplify the state concept. For example, an environment with an Enabled state and an OK status now has a single Active state.
  • Activate and deactivate actions - The Enable and Disable actions for an environment have been renamed to Activate and Deactivate to align the actions with the new environment states.
Deployments
  • Deployment state - The deployment state and status have been combined into a single state to simplify the state concept. For example, a deployment with a Disabled state and an OK status now has a single Deactivated state.
  • Start and stop actions - The Enable action for a deployment has been renamed to Start to more clearly indicate that after you start a deployment, the deployment is ready to launch engine instances. The Disable action has been renamed to Stop to more clearly indicate that after you stop a deployment, you can no longer launch engine instances for the deployment.
  • Deployment details - The Deployment details page has been enhanced to make the available actions more visible and to more clearly display details about the existing engines belonging to the deployment.
  • Launched engine status for self-managed deployments - When copying the installation script to launch an engine instance for a self-managed deployment, you can choose to check the engine status in the Control Hub UI. Previously, you could view the engine status only in the command prompt.
  • Engine installation type for self-managed deployments - After saving a self-managed deployment, you can change the engine installation type. For example, you can change a Tarball installation type to the Docker image installation type.
Fixed Issues
  • The list of EC2 instance types available when creating an Amazon EC2 deployment does not take into account the availability zone used by the parent AWS environment. If you select an unsupported instance type, AWS CloudFormation cannot provision the EC2 instances and the following error message displays:

    Your requested instance type (<type>) is not supported in your requested Availability Zone (<zone>)

    Important: To fix this issue, StreamSets has updated the sample IAM policy used to configure AWS credentials for an AWS environment. If you encountered this issue with an existing AWS environment and EC2 deployment, then update the IAM policy to include the ec2:DescribeInstanceTypeOfferings permission. If you have not encountered this issue, there is no need to update the IAM policy.
  • You can delete a pipeline included in a job when you do not have permission on the job, causing the job to become invalid.

June 2021

The following StreamSets platform releases occurred in June 2021.

June 23, 2021 (beta)

This release includes several fixed issues.

Fixed Issues
  • When the parent AWS environment uses an AWS region other than us-west-2, an Amazon EC2 deployment fails to launch a StreamSets engine with the following error:

    Parameter validation failed: parameter value ami-<AMI_ID> for parameter name AMI does not exist

  • Using the API Credentials view to generate and manage credentials for use with the Control Hub REST API is not supported at this time.

June 18, 2021 (beta)

This release includes several new features and enhancements.

New Features and Enhancements
Deployments
  • Engine version - The version number of a nightly engine build includes the build number in addition to the -SNAPSHOT suffix.

  • GCE deployments - When configuring the GCE autoscaling group for a deployment, you can enter the network tags to provision the VM instances with.

Jobs
  • Failover retries - When you enable a Data Collector or Transformer job for pipeline failover, you can configure the global number of pipeline failover retries to attempt across all available engines. When the limit is reached, Control Hub stops the job.

  • Balance jobs icon - The Engines view includes a new Balance Jobs icon: .

Subscriptions

You can configure a subscription action for a maximum global failover retries exhausted event. For example, you might create a subscription that sends an alert to a Slack channel when a job has exceeded the maximum number of pipeline failover retries across all available engines.

June 11, 2021 (beta)

This release includes enhancements and fixed issues.

Enhancements
Environments
The Allow Nightly Engine Builds property for environments has been moved to an advanced option. In most cases, you don't need to modify this property.
Roles
The Auth Token Administrator and Engine Guest roles have been removed because they are not applicable for the StreamSets platform.
Fixed Issues
  • If you create a Microsoft account just before signing up as a new StreamSets user with that Microsoft account, the sign in might fail.
  • In rare and randomly occurring scenarios, deleting a user from an organization might only partially succeed. Users in this state are not able to create a new login session, but may still show up in the Control Hub list of users for an organization.

  • A user with the Organization Administrator role requires the Metrics Reader role to view topology metrics. The Organization Administrator role should be sufficient.

June 1, 2021 (beta)

StreamSets is happy to announce the beta release of the StreamSets platform, a cloud-native platform that empowers data engineers to build data pipelines.

The StreamSets platform includes the following components that seamlessly work together to manage your pipelines - Control Hub, Data Collector, and Transformer. The Control Hub component provides a common UI for the following types of users:

  • Administrators use Control Hub to deploy and launch Data Collector and Transformer engines on-premises or in cloud environments. The engines are automatically tethered to Control Hub.

  • Data engineers use Control Hub to build, run and monitor pipelines across the deployed engines.

The beta release is available for development and testing, but is not meant for production use.

Known Issues

  • When you configure an Amazon EC2, Azure VM, or GCE deployment to provision EC2 or VM instances with the minimum 1GB of RAM required by an engine, the provisioned instances might unexpectedly crash when you run a job on the engine.

    Workaround: Configure cloud service provider deployments to deploy EC2 or VM instances with more RAM.

  • When you use certain Control Hub instances and you configure an AWS environment to use the automatic or manual cross-account role credential type, clicking Save in the Configure AWS Credentials step of the environment wizard displays an error. This problem occurs on the following Control Hub instances:
    • eu38.hub.streamsets.com
    • me36.hub.streamsets.com
    • na39.hub.streamsets.com

    Workaround: Configure the AWS environment to use the manual access keys credential type.

  • When you edit an active Kubernetes deployment and then immediately click Restart Engines, the engines fail to restart because the deployment is in the ACTIVATION state.

    Workaround: Wait approximately 15 seconds for the StreamSets Kubernetes Agent to retrieve the update request from Control Hub, and then restart all engines in the deployment.