Release Notes

January 2023

The following StreamSets DataOps Platform release occurred in January 2023.

January 20, 2023

This release includes several enhancements and fixed issues.

Enhancements

View and edit connection details from a published pipeline
When viewing a published pipeline that includes a stage using a connection, you can click the Edit Connection icon in the stage properties to view and edit the connection details.
Previously, only draft pipelines displayed the Edit Connection icon.
Customize table columns
You can resize, hide, and reorder the table columns displayed in the Connections view.
StreamSets will provide support to customize columns in additional views in future releases.
Job run history for job instances started from a job template
Control Hub now includes job instances started from a job template in the total count of job runs for the organization. As a result, Control Hub purges the run history for job instances started from a job template in the same way that it purges the run history for job and draft runs.
Previously, Control Hub indefinitely retained the run history for job instances started from a job template.
Search
The Technology Preview search functionality includes the following enhancements:
  • You can search for attached job instances created from job templates.
  • You can search for job instances that include a pipeline with a specified pipeline status.
  • You can search for job templates that have been archived.
  • In basic search, you can use auto-completion for the following search properties:
    • Label property for pipelines and fragments
    • Tag and Engine Label properties for job instances and job templates

    As you begin typing a value to search for, Control Hub displays a drop-down menu that lists values that match the entered characters.

Fixed Issues

  • Preview does not work when the Run Preview Through Stage property is set to a fragment.
  • The Engines view does not correctly sort the list of engines by the Memory Used column.
  • When using the new pipeline canvas UI while monitoring a job, the title of the job incorrectly displays the pipeline name instead of the job name.
  • If you create and publish a pipeline fragment using pipeline stages and you quickly enter a fragment prefix in the last step of the wizard, the newly created fragment is not added to the pipeline.
  • When using the Technology Preview search functionality and you sort pipeline search results by the Last Modified By column, the pipelines are incorrectly sorted by name.

2022

The following StreamSets DataOps Platform releases occurred in 2022.

December 2022

The following StreamSets DataOps Platform release occurred in December 2022.

December 16, 2022

This release includes several enhancements and fixed issues.

Enhancements
Pipeline fragments
When you create a pipeline fragment using pipeline stages and you choose to publish the new fragment, Control Hub displays the original pipeline in the canvas, automatically replacing the individual stages with the newly published fragment.
Previously, Control Hub displayed the original pipeline in the canvas with the individual stages.
Draft runs
While monitoring an engine in the Engines view, you can stop all draft runs currently running on that engine.
Search
The Technology Preview search functionality includes the following enhancements:
Subscriptions
You can configure a webhook action for a subscription to use one of the following additional authentication types to connect to the receiving system:
  • API Key
  • Bearer Token
  • OAuth 2.0
Fixed Issues
  • The Control Hub REST API incorrectly returns an HTTP 400 status code instead of a 404 status code when the specified environment or deployment does not exist.
  • When using the Technology Preview search functionality to find job templates or job instances last modified by a user, the search results incorrectly display the job created by that user.
  • When the Legacy Kubernetes integration is enabled, users cannot create a pipeline when they are assigned the Deployment Manager role but not the Provisioning Operator role or the Organization Administrator role.

November 2022

The following StreamSets DataOps Platform release occurred in November 2022.

November 18, 2022

This release includes several enhancements and fixed issues.

Enhancements
Deployments
In the Deployments view, you can filter the list of displayed deployments by engine type.
Pipeline design
When configuring the conditions for a Stream Selector processor in a Data Collector pipeline, you can reorder the output conditions.
Search
The Technology Preview search functionality includes the following enhancements:
  • You can search by user email address to find all pipelines or fragments committed or last modified by the user or to find all job templates or job instances created or last modified by the user.
  • When using basic mode to search for job templates or job instances, you can include or exclude the v prefix. For example, you can search for both v1 or 1 when defining the Pipeline Version property.

    Previously in basic mode, you had to include the prefix. For example, you had to search for v1.

Fixed Issues
  • When you expand or collapse a fragment in the job monitoring UI, Control Hub does not update the metric diagrams to display statistics for individual stages in the expanded fragment or for the single fragment stage.
  • When the Enable WebSocket Tunneling for UI Communication property is enabled for an organization, all users should be able to configure the Browser to Engine Communication type; however, only organization administrators can do so.
  • When using the Technology Preview search functionality, searching for job instances with an INACTIVE job status excludes job instances that have never run.
  • When using the Technology Preview search functionality and you sort the search results by job instance status, job instances that have never run are shown first when sorting by ascending status, and last when sorting by descending status.
  • When you add a fragment that uses parameters to a pipeline and define no parameter name prefix, Control Hub silently fails to add the fragment to the pipeline.
  • When you configure an Azure VM deployment, Control Hub does not display existing SSH key pairs in the Key Pair Name property because the documentation to create an Azure AD custom role for the parent environment does not include the following required permission:

    Microsoft.Compute/sshPublicKeys/read

    Important: StreamSets strongly recommends updating the custom role for all existing Azure environments, as described in the Azure environment prerequisites.

October 2022

The following StreamSets DataOps Platform releases occurred in October 2022.

October 28, 2022

This release includes a new feature.

New Feature
Provisioning of user accounts from Azure AD
After enabling SAML authentication using Azure AD, you can configure the provisioning of user accounts from Azure AD to StreamSets. When a user or group is created, updated, or deleted in Azure AD, the same changes are automatically made in StreamSets.
StreamSets supports System for Cross-domain Identity Management (SCIM) 2.0 for the provisioning of user accounts from an identity provider (IdP).
StreamSets will provide SCIM provisioning support for additional IdPs in future releases.

October 26, 2022

This release includes several enhancements and fixed issues.

Enhancements
Supported browsers
StreamSets Control Hub now supports the latest version of Microsoft Edge as a browser.
Search
The Technology Preview search functionality includes the following enhancements:
  • For job instance and job template search, the Pipeline Commit Label property has been renamed to Pipeline Version to more clearly indicate that the property specifies the version of the pipeline included in a job instance or job template.
  • Basic search for job templates no longer displays the Status property because job templates do not have a status.
Proxy server configuration for engines

When you set up a deployment that configures a Data Collector or Transformer engine to use a proxy server for outbound network requests, you can now include special characters in the proxy user and password values, except for an exclamation point (!), a backward slash (\), a leading number sign (#), or a leading or trailing space.

Fixed Issues
  • The Test Connection button in the connection wizard incorrectly requires a user to have the Deployment Manager role.
  • Creating a pipeline incorrectly requires a user to have the Deployment Manager role.
  • The Fields to Convert property in the Field Type Converter processor does not allow an expression that includes a comma.
  • If you configure an engine to use a proxy server and you include a space, single quotation mark, or double quotation mark in the defined proxy properties, then either the engine installation script fails or the proxy authentication fails.
  • Topology data SLAs do not trigger alerts because they do not correctly retrieve data.

October 12, 2022

This release includes a fixed issue.

Fixed Issue
  • When you delete multiple users at the same time, a user synchronization error message displays even though the users are successfully deleted.

October 6, 2022

This release includes a fixed issue.

Fixed Issue
  • When SAML authentication is enabled, only users with the Organization Administrator role can log in.

October 5, 2022

This release includes new features, enhancements, a behavior change, and fixed issues.

New Features and Enhancements
Search
You can perform both basic and advanced searches to find specific pipelines, fragments, job instances, or job templates. With basic search, you define search conditions by selecting the object properties, operators, and values that you want to search for.
With advanced search, you define search conditions by writing a search query using the StreamSets advanced query language (SAQL). Advanced search allows you to specify criteria that cannot be defined in basic search, such as defining complex conditions or changing the order of precedence for multiple conditions.
Search replaces filtering and is a Technology Preview functionality. To try the functionality, you must enable search in your browser settings. StreamSets will implement search for additional object types in future releases.
Copy field value in preview
You can quickly copy a field value from pipeline preview using the Copy Field Value to Clipboard icon: .
Pipeline preview displays the new icon next to the existing Copy Field Path to Clipboard icon: . The icons display for each field in the previewed records, as follows:
Previously, pipeline preview displayed only a Copy Field Path to Clipboard icon for each field.
Pipeline version history
When using the new pipeline canvas UI, the pipeline or fragment version history that opens from the canvas displays all information on the initial panel. You do not need to expand the version history to manage the versions including viewing commit messages, comparing versions, creating tags for versions, and deleting versions.
In addition, the new version history includes a link to each version and displays the minimum engine version that the pipeline or fragment version can run on, as follows:
Import pipelines and fragments without connections
When you import pipelines or fragments that use connections and those connections do not exist in the target organization, you can choose to import the objects without connections. After the import, you must edit the pipelines or fragments to define the connections.
Previously, to import pipelines or fragments that use connections, the connections had to exist in the target organization.
Subscription parameters
You can use the ERROR_MESSAGE parameter for a subscription triggered by a maximum global failover retries exhausted event.
Engine resource thresholds
The default value of the Max Memory threshold for an engine is now 100%. Previously, the default value was 80%. Existing engines retain the currently configured threshold.
Behavior Change

With this release, the sample IAM policy for credentials provided by a Control Hub AWS environment includes the following additional permission:

autoscaling:DescribeWarmPool

This permission is required when you update the number of engine instances for an active Amazon EC2 deployment belonging to an AWS environment.

If the IAM policy does not include this permission and you update the number of engine instances, the deployment transitions to an Activation Error state. When you access the tracking URL to the AWS Management Console, the Events tab displays the following error:

API: autoscaling:DescribeWarmPool User: arn:aws:sts::${ACCOUNT_ID}:assumed-role/${CROSS_ACCOUNT_ROLE}/STREAMSETS_SCH is not authorized to perform: autoscaling:DescribeWarmPool because no identity-based policy allows the autoscaling:DescribeWarmPool action

StreamSets strongly recommends updating the IAM policy for credentials for all existing StreamSets AWS environments to avoid this deployment error.

Log into the AWS Management Console and update the IAM policy for credentials created for each StreamSets AWS environment. Add the permission in bold to the following section of the IAM policy:
...
{
    "Sid": "0",
    "Effect": "Allow",
    "Action": [
        "ec2:DescribeImages",
        "autoscaling:DescribeScalingActivities",
        "ec2:DescribeVpcs",
         "autoscaling:DescribeAutoScalingGroups",
        "ec2:DescribeRegions",
        "autoscaling:DescribeLaunchConfigurations",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeInstanceTypeOfferings",
        "ec2:DescribeSubnets",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeInstances",
        "autoscaling:DescribeScheduledActions”,
        "autoscaling:DescribeWarmPool"
        ],
    "Resource": "*"
},
...

You do not need to restart active StreamSets AWS environments or Amazon EC2 deployments after updating the policy. However, if an existing Amazon EC2 deployment has transitioned to an Activation Error state due to this error, then you must stop that deployment and start it again.

For more information about the required IAM policy, see Configure AWS Credentials.

Fixed Issues
  • The Topologies Dashboard can display an incorrect number of engines.
  • Pipelines that contain fragments with names starting with a number do not correctly display in the new pipeline canvas UI and encounter unexpected errors in the classic pipeline UI.
  • Stages that were originally in a fragment could become uneditable.

August 2022

The following StreamSets DataOps Platform release occurred in August 2022.

August 26, 2022

This release includes several enhancements, a behavior change, and fixed issues.

Enhancements
Scroll zoom in pipeline canvas
You can now zoom in or out of the pipeline canvas using the mouse scroll wheel or using the trackpad. By default, scroll zoom is enabled. You can disable scroll zoom if needed.
Hide or show contextual help in wizards
When you create or edit an object, such as an environment, deployment, pipeline, or job, you can now hide or show the contextual help that displays on the right of the wizard.
Behavior Change
Filters for job templates, job instances, or draft runs
With this release, when you select the Keep Filter Persistent checkbox to retain the filter on the Job Templates, Job Instances, or Draft Runs view, Control Hub retains a unique filter for each view. Previously, Control Hub incorrectly used the same filter for all of these views.
As a result, when you first access the Job Templates, Job Instances, or Draft Runs view, no filter is applied, even if you previously persisted the filters.
Fixed Issues
  • In rare cases when editing a deployment, the External Resource Source property can be set to null which causes a null pointer exception.
  • The TRIGGERED_COUNT and TRIGGERED_ON subscription parameters do not contain the correct values.
  • The Windowing Aggregator processor does not display aggregation charts.
  • You can delete a job associated with a running scheduled task, resulting in a scheduled task that attempts to trigger actions on a job that doesn’t exist.
  • In rare cases, the configured time zone for a scheduled task is ignored, which causes the task to start and finish at undesired times.
  • Clicking Upload Offset & Start while viewing job instance details uploads the offset but doesn’t start the job.
  • Because the initialization script for an Azure VM deployment does not run before engine instances start, you cannot use the script to update DNS entries. As a result, engines fail to launch when the Azure VNet uses custom DNS servers.

July 2022

The following StreamSets DataOps Platform release occurred in July 2022.

July 20, 2022

This release includes several enhancements and fixed issues.

Enhancements
Runtime parameters for pipelines
Some pipeline and stage properties conditionally display child properties. For example, if you configure an origin to use the Delimited data format, the origin displays a set of Delimited configuration properties. If you configure that origin to use the JSON data format, it displays a different set of JSON configuration properties.
However, if you use a runtime parameter to define a parent property, all child properties now display so that you can configure valid values for all dependent properties.
For example, if you convert the Data Format drop-down menu to a text box and then call the dataformat parameter using the required syntax, the origin displays all of the Delimited, JSON, and Text configuration properties.
Job monitoring toolbar
When you monitor a job, the canvas includes a new toolbar that more clearly indicates the action that each toolbar icon completes.
You can switch between the new and classic UI while monitoring a job.
Starting multiple scheduled jobs at the same time
When you start multiple jobs at the exact same time using the scheduler, the number of pipelines running on an engine can exceed the Max Running Pipeline Count configured for the engine.
If exceeding the resource threshold is not acceptable, you can enable an organization property that synchronizes the start of multiple scheduled jobs. However, be aware that enabling the property can cause scheduled jobs to take longer to start.
Fixed Issues
  • When sharing an object, you cannot search for user and group names that include the search string, only names that start with the search string.
  • If you rename a job, scheduled tasks for that job do not update the job name.
  • When you delete users or groups and then create users or groups with the same ID, permissions might behave unexpectedly.

June 2022

The following StreamSets DataOps Platform releases occurred in June 2022.

June 17, 2022

This release includes an enhancement and several fixed issues.

Enhancement
New stages centered in the canvas
When you use the stage library panel to add a new stage to the pipeline canvas, the stage appears in the center of the current view of the canvas. Previously, it appeared to the right of the right-most stage in the pipeline.
Fixed Issues
  • When a deployment contains multiple engine instances and you try to share the deployment with another user or group for a second time, Control Hub fails to update the permissions due to a 500 internal server error.
  • A single faulty subscription prevents other suitable subscriptions from triggering.
  • You cannot access a draft run of a pipeline when you view the list of running pipelines on an engine from the Engines view.

June 3, 2022

This release includes an enhancement and several fixed issues.

Enhancement
View and edit connection details from stage properties
When configuring a pipeline or fragment stage that uses a connection, you can click the Edit Connection icon in the stage properties to view and edit the connection details.
Fixed Issues
  • The Available Pipeline Runners histogram does not display when you monitor a job for a Data Collector multithreaded pipeline.
  • The pipeline canvas cannot handle more than 100 instances of the same stage.
  • Provisioning Agents are incorrectly included in the system limit for engines.

May 2022

The following StreamSets DataOps Platform releases occurred in May 2022.

May 20, 2022

This release includes several enhancements and fixed issues.

Enhancements
Compare pipeline and fragment versions
When using the new pipeline canvas UI to compare two versions of a pipeline or pipeline fragment, you can click the Open in Canvas icon () to open one of the versions in the pipeline canvas.
Engine type icon in Fragments, Pipelines, and Sample Pipelines views
The Fragments view, Pipelines view, and Sample Pipelines view include an engine type icon next to each pipeline or fragment name so that you can quickly determine the pipeline or fragment type.
The following table lists each engine type icon:
Icon Engine Type
Data Collector
Transformer
Transformer for Snowflake
Fixed Issues
  • Credential fields in a pipeline state notification webhook do not properly show the value of entered passwords when authentication is used.
  • On Windows 10, some file archive programs cannot extract an exported Control Hub ZIP file because of the colon (:) character in the JSON file names.

May 6, 2022

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Draft runs
Pipeline test runs have been renamed to draft runs to more clearly indicate that a draft run is the execution of a draft pipeline.
Control Hub includes a new Run > Draft Runs view that you use to manage all draft runs that you have access to. You can view active and inactive draft runs, view the draft run history for a pipeline, and can stop or delete draft runs as needed.
Previously, you could only manage active draft runs from the pipeline canvas.
Stage selector in the pipeline canvas
The new pipeline canvas UI includes an enhanced stage selector with a horizontal layout, making it easier to filter the stages by type.
In addition, when you add a stage with an open link, the pipeline canvas displays a dotted link connected to the Add Stage icon, clearly indicating that you need to add another stage:
When you click the Add Stage icon, the stage selector opens, allowing you to search for and select the next stage to add:
Fixed Issues
  • When authorizing Control Hub API calls, Control Hub does not consider the roles assigned by a user’s group.
  • When you change the owner of a scheduled task, the previous owner is incorrectly listed as the user that executes the task.

April 2022

The following StreamSets DataOps Platform releases occurred in April 2022.

April 27, 2022

This release includes a fixed issue.

Fixed Issue
  • You cannot create new users using the Control Hub REST API.

April 22, 2022

This release includes several new features and enhancements.

New Features and Enhancements
Init scripts for cloud service provider deployments
You can define an initialization script for cloud service provider deployments, such as Amazon EC2, Azure VM, and GCE deployments. Control Hub runs the init script while provisioning a new instance in your cloud account.
Use the script to set up provisioned instances with additional software as required by your organization. For example, you might use an init script to install required certificates or packages on each provisioned instance.
Pipeline canvas toolbar
The pipeline canvas includes a new toolbar that more clearly indicates the action that each toolbar icon completes. By default, the pipeline canvas continues to display the classic toolbar.
You can switch between the new and classic UI while viewing a pipeline or fragment in the canvas.
Job status
Jobs that have not run are listed with an Inactive status. Previously, jobs that had not run were listed without a status.
Editable group ID
When you create a group, you define the group display name and can optionally edit the group ID. By default, Control Hub generates the group ID from the display name, using all lowercase characters and replacing any spaces with underscores. Users type the group ID when using credential functions in pipelines. As a result, you might want to edit the default group ID to make it easier to use with credential functions.
Previously when you created a group, you defined only the group display name. Control Hub used a randomly generated ID for the group ID which was not editable. When using credential functions, you had to type the randomly generated ID. Groups created before this release retain their existing group IDs.

April 13, 2022

This release includes a fixed issue.

Fixed Issue
  • You cannot directly log in to the SAML configuration page as an Organization Administrator if you belong to another organization with SAML authentication disabled.

April 8, 2022

This release includes several enhancements and fixed issues.

Enhancements
Simplified pipeline and fragment wizards

The pipeline wizard and the fragment wizard have been simplified and no longer include the redundant Review & Open step.

Upgrading a pipeline to use a later authoring engine version
When you select a later authoring engine version for a pipeline, Control Hub now informs you that you are upgrading the pipeline so that it can no longer run on the earlier engine version. If you select a later engine version for a draft pipeline, you are given the choice to publish the draft pipeline first, and then create a new draft pipeline that is upgraded to run on the later engine version. That way, you retain a pipeline version that can run on the earlier engine version.
Previously, when you selected a later engine version for a pipeline, Control Hub automatically upgraded the pipeline without the upgrade warning.
Fixed Issues
  • When you disable a deployment and then access a pipeline using an authoring engine from that deployment, you are logged out.
  • Export All Published Pipelines exports only some published pipelines.
  • Pressing Ctrl+Z doesn't undo text changes after you edit text in the Control Hub UI.
  • Exporting jobs can generate a corrupt ZIP file.

March 2022

The following StreamSets DataOps Platform release occurred in March 2022.

March 18, 2022

This release includes a new feature and several fixed issues.

New Feature
Transformer for Snowflake available in public preview
New and existing organizations can create and run Transformer for Snowflake pipelines.
Transformer for Snowflake is a component of StreamSets DataOps Platform that enables processing Snowflake data using Snowflake's Snowpark client libraries.
Transformer for Snowflake provides a user interface that allows designing and performing complex processing in Snowflake without having to write SQL queries or templates. It also provides easy access to Snowpark DataFrame-based processing so you can use Snowpark without having to set it up in your Snowflake account.
Important: While in public preview, do not use Transformer for Snowflake for production workloads. To provide suggestions and feedback, contact productfeedback@streamsets.com. For help with Transformer for Snowflake, join the StreamSets Community and use the Transformer for Snowflake tag when you post your question.
For more information about Transformer for Snowflake, see the Transformer for Snowflake documentation.
Fixed Issues
  • When you stop an Azure VM deployment, the Azure key vaults created to store secrets for the provisioned VM instances are not deleted from your Azure account.

    If you added Azure tags to Azure VM deployments that were stopped before this fix, you can use the tags to find the key vaults and then delete them from your Azure account.

  • Subscriptions do not trigger when permission enforcement is disabled for the organization.

February 2022

The following StreamSets DataOps Platform releases occurred in February 2022.

February 25, 2022

This release includes several enhancements and fixed issues.

Enhancements
Install stage libraries while creating connections
When using an authoring Data Collector 4.4.x or later, you can select from all possible connection types while creating a connection, even if the corresponding stage library is not installed on the authoring Data Collector. If you select an uninstalled connection type, you can choose to update the deployment with the missing stage library that includes the connection type.
​​SDC_ID subscription parameter displays as Engine ID in the user interface

When you create a subscription and define a simple condition for an execution engine not responding event, the SDC_ID parameter now displays as Engine ID in the UI, instead of SDC_ID. The Engine ID label indicates that the parameter can include the ID of any engine type.

Continue to use SDC_ID as the parameter name when you use the parameter to represent StreamSets engines.

Scheduled tasks
While creating a scheduled task, you can search for the job to schedule.
Fixed Issues
  • Exporting multiple pipelines can generate a corrupt ZIP file.
  • You cannot download a snapshot captured for a pipeline test run or job run.
  • When you edit a subscription condition in the UI, the changes are not saved.
  • The job History tab inconsistently displays Data Collector URLs.

February 23, 2022

This release includes an enhancement and a fixed issue.

Enhancement
Supported SAML identity providers
StreamSets supports PingFederate as a SAML identity provider.
Fixed Issue
  • Control Hub allows you to test a SAML draft configuration when SP-initiated logins are disabled, even though testing SAML from Control Hub is only supported when SP-initiated logins are enabled. Testing when SP-initiated logins are disabled results in a 404 Not Found error.

February 11, 2022

This release includes several enhancements and fixed issues.

Enhancements
Create fragments using pipeline stages
When you create a fragment from selected stages in a pipeline, you can choose to immediately publish the fragment as long as the fragment meets the validation requirements to be published. Or, you can choose to create a draft fragment and then continue designing the pipeline.
If you immediately publish the fragment, you can then replace the individual stages in the pipeline with the newly published fragment.
GCE deployments support selecting a subnet
GCP environments support VPC networks that have an auto or custom subnet creation mode. As a result, while creating a GCE deployment, you must select the subnet within the VPC network to provision the Compute Engine instances in.
Previously, GCP environments only supported auto mode VPC networks, where the VPC automatically creates one subnet from each region. You did not have to select a subnet while creating a GCE deployment because the VPC automatically selected a suitable subnet.
Contextual help sidebar
After selecting a view in the Navigation panel, click the Help icon () above the listed objects to open the contextual help sidebar. To close the contextual help, click outside of the sidebar. The contextual help sidebar has been implemented for most views. It will be implemented for additional views in future releases.
Use the contextual help to access common getting started, usage, troubleshooting, and best practice help topics and StreamSets Learning Academy videos. You can also search for topics in the StreamSets documentation.
The help links that display on each tab depend on the selected view. For example, the following image displays the getting started help topics for the Pipelines view:

Contextual help in environment, deployment, and pipeline wizards
When you create or edit an environment, deployment, or pipeline, the wizard displays contextual help on the right. The help contents depend on the step being completed in the wizard. Use the scroll bar to view the complete help contents for each step.
For example, the following image displays the contextual help provided in the first step of the pipeline wizard:

Fixed Issues
  • Restarting multiple engines at the same time causes an error.
  • When you select a later version of an authoring Data Collector while a pipeline is in read-only mode, stage libraries are not updated in the pipeline.
  • The Global Failover Retries property does not display when creating or editing a Transformer job.
  • GCP environments support only VPC networks that have automatic subnet creation enabled.
  • GCP environments do not support using a shared VPC network from a host GCP project.

February 4, 2022

This release includes an enhancement and fixed issue.

Enhancement
Encrypting SAML assertions
When you disable SP-initiated logins for SAML authentication, you can optionally configure the IdP to encrypt the SAML assertion. If you choose not to configure the IdP to encrypt SAML assertions, then you must disable the Require Encryption on Assertion property in the draft SAML configuration for your organization.
Previously, when you disabled SP-initiated logins, you had to configure the IdP to encrypt the SAML assertion.
Fixed Issue
  • If you belong to multiple organizations that have SAML authentication enabled and you sign in to StreamSets using SSO SAML, you cannot choose which organization to log into.

January 2022

The following StreamSets DataOps Platform releases occurred in January 2022.

January 31, 2022

This release includes several enhancements and fixed issues.

Enhancements
Create connections in the pipeline canvas
While building a pipeline, you can create a new connection for a stage without leaving the pipeline canvas.
Connection details
The Connection details page has been enhanced to make the available actions more visible and to more clearly display details about all pipeline and fragment versions using the connection.
Import existing pipelines or fragments as new
When you import a pipeline or fragment that already exists in the target organization, you can choose to import the object as a new pipeline or fragment.
Contextual help in the pipeline canvas
While building a pipeline, click the Help tab in the pipeline properties panel to access common getting started, pipeline usage, troubleshooting, and best practice help topics and StreamSets Learning Academy videos. You can also search for topics in the StreamSets documentation.
The help links that display on each tab depend on the pipeline type and the stages added to the pipeline. For example, the following image displays the getting started help topics for a Data Collector pipeline:

Fixed Issues
  • When viewing Transformer engines in the Engines view, clicking the More icon and then clicking Transformer Components displays the Getting Started page.
  • Pipeline export fails when a pipeline contains multiple stages of the same type that use different library versions.
  • You cannot download an engine support bundle.

January 21, 2022

This release includes a behavior change.

Behavior Change
Transformer version for new deployments
Starting with Transformer version 4.2.0 released on January 21, 2022, you can create a new deployment only for Transformer 4.2.0 or later. Earlier Transformer versions are not supported in new deployments.
Existing deployments can continue to use the earlier Transformer versions.

January 14, 2022

This release includes several enhancements, behavior changes, and fixed issues.

Enhancements
Duplicate connections
You can duplicate a connection to create a copy of an existing connection. You can then change the configuration of the copy.
Share pipeline or fragment while publishing
While using the Check In wizard to publish a pipeline or publish a fragment, you can share the pipeline or fragment with other users and groups.
Behavior Changes
Data Collector version for new deployments
Starting with Data Collector version 4.3.0 released on January 13, 2022, you can create a new deployment only for Data Collector 4.3.0 or later. Earlier Data Collector versions are not supported in new deployments.
Existing deployments can continue to use the earlier Data Collector versions.
Fixed Issues
  • A running job becomes corrupted if you import another version of the job that uses a newer pipeline version.
  • A subscription for a job status change event with a JOB_OWNER condition set to a specific email address fails to trigger because the JOB_OWNER parameter is always null.

2021

The following StreamSets DataOps Platform releases occurred in 2021.

December 2021

The following StreamSets DataOps Platform releases occurred in December 2021.

December 20, 2021

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Job templates
Job templates include the following enhancements:
  • Job templates display on the Job Templates view, separate from the list of job instances that display on the Job Instances view.
  • When you create job instances from a job template, the instances are attached to the parent job template by default. Attached job instances display in the parent job template’s run history and are updated when the parent job template is updated.

    You can optionally detach job instances from the parent job template when you want to use the job details and default parameter values defined in the template, but don't want subsequent changes to the template to be applied to the job instance.

  • When you create a job template for a pipeline that uses runtime parameters, you define whether each parameter functions as a dynamic parameter that can be overridden in child job instances or as a static parameter that cannot be overridden.
  • You can archive a job template when you do not want new job instances to be created from the template, but want existing job instances to continue to run.
  • Job instances created from a job template inherit all tags added to the template.
Stage library selection for deployments
When you define the engine configuration for a deployment, you can select individual stages to install or uninstall, in addition to selecting stage libraries.
When you select individual stages, you can:
  • Filter the list of available stages by type or search for stages by name.
  • Select the stage library version to install, not just the latest version.
  • View the complete list of stages included in each stage library.
In addition, the stage library selection window includes a summary view of the currently selected stage libraries by file name. For example, the Basic stage library is listed as streamsets-datacollector-basic-lib.
Engine installation script for self-managed deployments
When using a self-managed deployment, you can configure the engine installation script to run the engine as a foreground or background process.
Previously, a tarball installation script always ran the engine as a foreground process. A Docker image installation script always ran the engine as a background process.
Authoring engine selection for pipelines, fragments, and connections
When you create pipelines, fragments, or connections, the authoring engine selection window displays the current CPU and memory usage of each engine.
The resource usage can help you troubleshoot issues with inaccessible engines. For example, when an engine uses an excessive amount of available resources, the machine running the engine might lose connection to Control Hub or the browser.
Sticky notes for pipeline canvas
To delete a sticky note from the pipeline canvas, you click the Delete icon () instead of the X in the note header.
Fixed Issues
  • When legacy Kubernetes is enabled for your organization and you have created legacy deployments only, you receive the following message when trying to create a pipeline:

    Attention: You need to set up a deployment with an engine before you can create a pipeline

  • If you have an existing Data Collector deployment with an installed enterprise stage library, and you edit the deployment to upgrade to a new version of the same enterprise stage library, you cannot remove the existing version of the stage library. As a result, Data Collector fails to start with the following error:

    REST_1001 - Unable to find following stage libraries in repository list: streamsets-datacollector-<enterprise library type>-lib:<new version>, streamsets-datacollector-<enterprise library type>-lib:<existing version>

  • If existing Control Hub users have an Invited or Expired status and then you enable and disable SAML authentication, disabling SAML fails with the following error:

    javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: could not execute statement

November 2021

The following StreamSets DataOps Platform releases occurred in November 2021.

November 17, 2021

This release includes a new feature.

New Feature
SAML authentication
StreamSets DataOps Platform supports single sign-on (SSO) authentication with SAML 2.0 with selected identity providers (IdPs).
Enabling SAML authentication requires registering StreamSets as a service provider in one of the supported IdPs and configuring SAML authentication for your Control Hub organization. Once enabled, all users in the organization must log in with SAML authentication.
At this time, StreamSets supports the following SAML identity providers:
  • Microsoft Active Directory Federation Services (AD FS)
  • Okta

November 5, 2021

This release includes a new feature and several fixed issues.

New Feature
Legacy Kubernetes integration
Control Hub provides a legacy Kubernetes integration that you can use to automatically provision Data Collectors on Kubernetes. Provisioning includes deploying, registering, starting, scaling, and stopping Data Collector Docker containers in a Kubernetes cluster. Legacy Kubernetes integration requires that the Provisioning Agent use the Control Agent Docker image version 4.0.0 or later.
Legacy Kubernetes integration is enabled only for some paid accounts.
Fixed Issues
  • When viewing real-time statistics and metrics for an active Transformer job, the names of stages display incorrectly.
  • Jobs can transition to an inactive_error status with the following error because multiple Job Runner applications attempt to process the same job:

    JOBRUNNER_69 At least one of the execution engines <engine ID> didn’t respond to the stop command

October 2021

The following StreamSets DataOps Platform releases occurred in October 2021.

October 22, 2021

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Microsoft Azure integration
Control Hub provides an integration with your Microsoft Azure account. When you use Azure environments and Azure VM deployments, Control Hub automatically provisions Azure VM instances needed to run StreamSets engines in your Azure account, and then deploys engine instances to those VM instances.
JVM memory strategy for deployed engines
By default, engines are now configured to use 50 percent of the available memory on the host machine as the Java heap size. Previously, engines were configured to use an absolute value, 1024 MB by default.
You can edit the Java heap size percentage or set an absolute value by modifying the engine advanced configuration properties for a deployment. However, in most cases, the default percentage value is sufficient.
Quick Start
When you create self-managed deployments from the Quick Start menu, Control Hub assigns the deployments a quick-start tag and names the deployments as follows, based on the selected engine type:
  • Docker Data Collector <number> (Quick Start)
  • Docker Transformer <number> (Quick Start)
For example, if you create two deployments of Data Collector from the Quick Start menu, Control Hub names the deployments Docker Data Collector 1 (Quick Start) and Docker Data Collector 2 (Quick Start).
Previously, all deployments created from the Quick Start menu were named local.
Job error acknowledgement
By default when a job encounters an inactive error status, users must acknowledge the error message before the job can be restarted. You can configure a job to skip job error acknowledgements. You might want to skip job error acknowledgement for scheduled jobs so the job can automatically be restarted without requiring user intervention. However, be aware that skipping job error acknowledgement might hide errors that the job has encountered.
Upgrade to a paid subscription
If you have a Free account, you can upgrade to a paid subscription from the My Account menu.
Fixed Issues
  • The pipeline canvas incorrectly allows adding a fragment to another fragment although nested fragments are not supported.
  • After reloading a pipeline, parameters called from properties that display as checkboxes are not displayed.
  • Control Hub does not allow the at sign (@) in the property name for any engine advanced configuration properties.

September 2021

The following StreamSets DataOps Platform releases occurred in September 2021.

September 28, 2021

This release includes a fixed issue.

Fixed Issue
  • Pipeline fragment creation is failing.

September 15, 2021

This release includes several enhancements.

Enhancements
In-application chat
You can use the in-application chat to start a conversation with the StreamSets team. You can enable or disable the chat in your account settings accessible from the My Account window.
Sticky notes for pipeline canvas
You can add sticky notes to the pipeline canvas to include notes that might be useful to you or your team as you build a pipeline or pipeline fragment. For example, you might add a note as a reminder to revisit a portion of the pipeline design or to change a stage after running initial tests of the pipeline.
Updated usage limits
The usage limits for organizations with a Free account have been updated as follows:
  • 2 users - For an existing Free account with more than 2 users, all users can continue to log in as a member of the organization. However, you cannot invite additional users to the organization.
  • 10 published pipelines - For an existing Free account with more than 10 published pipelines, you can continue to use all published pipelines.
The usage limit of 2 active jobs remains unchanged.

August 2021

The following StreamSets DataOps Platform releases occurred in August 2021.

August 11, 2021

This release includes several enhancements.

Enhancements
My Account
All properties previously available in the User Settings window have been moved to the My Account window. As a result, you can manage all your account settings from a single window. You can also modify your display name in the My Account window.
Confirmation dialog boxes
The OK button in several confirmation dialog boxes has been renamed to reflect the action that you are taking. For example, when deleting a connection, the OK button has been renamed to Delete.
Roles and permissions required to delete an engine
A user must have the following roles and permissions to delete an engine:
Task Roles Permissions
Delete an engine. Engine Administrator

Deployment Manager

Write on engine
Previously, a user required the following roles and permissions to delete an engine:
Task Roles Permissions
Delete an engine. Engine Administrator

Organization Administrator

Execute on engine
This change does not affect role and permission assignments for existing users. A user can still delete an engine with the previous role and permission requirements.
For the complete list of required roles and permissions to complete common engine tasks, see Engine Tasks.

July 2021

The following StreamSets DataOps Platform releases occurred in July 2021.

July 30, 2021

This release includes several new features, enhancements, behavior changes, and fixed issues.

New Features and Enhancements
Deployment and engine permissions
With this release, engines always inherit the same permissions assigned to the deployment. Those inherited permissions cannot be modified at the engine level. For example, if you grant a user Execute permission on a deployment, that user also has Execute permission on all engines managed by that deployment.
Previously, you could modify permissions at the engine level. As a result, users could have different permissions on engines managed by the same deployment.
If you previously modified engine permissions, those changes are now overridden with the deployment permissions. Check that users' access levels to engines have not unexpectedly increased or decreased, and modify the deployment permissions as needed.
Engines
  • Shut down engines - You can shut down engines belonging to a self-managed deployment from the Engines view. You can also restart or shut down multiple engines at the same time.
  • Resource thresholds - The default values of the Max CPU Load and Max Memory thresholds for an engine are now 80%. Previously, the default values were 100%.
Connections
When creating or editing a connection, you can filter the list of available authoring Data Collectors by deployment, version, or label. The Data Collector with the most recent reported time is selected by default.
Pipelines
  • Pipeline and fragment creation - While creating a new pipeline or pipeline fragment, you can choose to open the pipeline or fragment in the canvas before completing the Share Pipeline or Share Pipeline Fragment step.
  • Sample pipelines - When you open a sample pipeline in the pipeline canvas, the Duplicate button has been renamed to Create a pipeline from sample.
Free and paid accounts
DataOps Platform now offers paid accounts in addition to free accounts.
With a free account, your use is subject to a number of limits. For legal related limits - including those related to warranty, support, and SLA - please see Section 17 of the Terms of Use. For usage related limits, such as organization limits, see Organization Default System Limits. A free account is not limited to internal evaluation purposes, but may be terminated by either party at any time.
With a paid account, you can increase the organization limits. For more information about paid accounts, contact your StreamSets account team or send an email to subscriptions@streamsets.com.
Users
  • Session timeout - After 30 minutes of inactivity, user sessions expire and users are logged out. An organization administrator can change the inactivity period for the organization.
  • Deleting users - When you attempt to delete an active user, Control Hub prompts you to deactivate the user during the deletion process.
Behavior Changes
Data Collector version for new deployments
With this release, you can create a new deployment only for Data Collector version 4.0.2 or later. Earlier Data Collector versions are not supported in new deployments.
Existing deployments can continue to use the earlier Data Collector versions.
Fixed Issues
  • If you stop multiple deployments at the same time, Control Hub might fail to completely stop the deployments with the following error:
    An AppException occurred deactivating deployment tokens after stack was removed:
    Issues: [CSP_046 - Rest Api '<Control Hub URL>/security/rest/v1/organization/<organization ID>/components/delete' failed with status:'500']
  • No errors display in the UI when you restart Docker engine instances for a self-managed deployment with an external resource archive that uses an incorrect file format.

July 14, 2021 (beta)

This release includes several new features, enhancements, and fixed issues.

New Features and Enhancements
Quick Start
To get started with StreamSets, click Quick Start in the top toolbar to quickly deploy a Data Collector or Transformer engine using Docker. After the engine is launched and running, click Quick Start > Create a pipeline to quickly create a pipeline.
Environments
  • Environment state - The environment state and status have been combined into a single state to simplify the state concept. For example, an environment with an Enabled state and an OK status now has a single Active state.
  • Activate and deactivate actions - The Enable and Disable actions for an environment have been renamed to Activate and Deactivate to align the actions with the new environment states.
Deployments
  • Deployment state - The deployment state and status have been combined into a single state to simplify the state concept. For example, a deployment with a Disabled state and an OK status now has a single Deactivated state.
  • Start and stop actions - The Enable action for a deployment has been renamed to Start to more clearly indicate that after you start a deployment, the deployment is ready to launch engine instances. The Disable action has been renamed to Stop to more clearly indicate that after you stop a deployment, you can no longer launch engine instances for the deployment.
  • Deployment details - The Deployment details page has been enhanced to make the available actions more visible and to more clearly display details about the existing engines belonging to the deployment.
  • Launched engine status for self-managed deployments - When copying the installation script to launch an engine instance for a self-managed deployment, you can choose to check the engine status in the Control Hub UI. Previously, you could view the engine status only in the command prompt.
  • Engine installation type for self-managed deployments - After saving a self-managed deployment, you can change the engine installation type. For example, you can change a Tarball installation type to the Docker image installation type.
Fixed Issues
  • The list of EC2 instance types available when creating an Amazon EC2 deployment does not take into account the availability zone used by the parent AWS environment. If you select an unsupported instance type, AWS CloudFormation cannot provision the EC2 instances and the following error message displays:

    Your requested instance type (<type>) is not supported in your requested Availability Zone (<zone>)

    Important: To fix this issue, StreamSets has updated the sample IAM policy used to configure AWS credentials for an AWS environment. If you encountered this issue with an existing AWS environment and EC2 deployment, then update the IAM policy to include the ec2:DescribeInstanceTypeOfferings permission. If you have not encountered this issue, there is no need to update the IAM policy.
  • You can delete a pipeline included in a job when you do not have permission on the job, causing the job to become invalid.

June 2021

The following StreamSets DataOps Platform releases occurred in June 2021.

June 23, 2021 (beta)

This release includes several fixed issues.

Fixed Issues
  • When the parent AWS environment uses an AWS region other than us-west-2, an Amazon EC2 deployment fails to launch a StreamSets engine with the following error:

    Parameter validation failed: parameter value ami-<AMI_ID> for parameter name AMI does not exist

  • Using the API Credentials view to generate and manage credentials for use with the Control Hub REST API is not supported at this time.

June 18, 2021 (beta)

This release includes several new features and enhancements.

New Features and Enhancements
Deployments
  • Engine version - The version number of a nightly engine build includes the build number in addition to the -SNAPSHOT suffix.

  • GCE deployments - When configuring the GCE autoscaling group for a deployment, you can enter the network tags to provision the VM instances with.

Jobs
  • Failover retries - When you enable a Data Collector or Transformer job for pipeline failover, you can configure the global number of pipeline failover retries to attempt across all available engines. When the limit is reached, Control Hub stops the job.

  • Balance jobs icon - The Engines view includes a new Balance Jobs icon: .

Subscriptions

You can configure a subscription action for a maximum global failover retries exhausted event. For example, you might create a subscription that sends an alert to a Slack channel when a job has exceeded the maximum number of pipeline failover retries across all available engines.

June 11, 2021 (beta)

This release includes enhancements and fixed issues.

Enhancements
Environments
The Allow Nightly Engine Builds property for environments has been moved to an advanced option. In most cases, you don't need to modify this property.
Roles
The Auth Token Administrator and Engine Guest roles have been removed because they are not applicable for StreamSets DataOps Platform.
Fixed Issues
  • If you create a Microsoft account just before signing up as a new StreamSets user with that Microsoft account, the sign in might fail.
  • In rare and randomly occurring scenarios, deleting a user from an organization might only partially succeed. Users in this state are not able to create a new login session, but may still show up in the Control Hub list of users for an organization.

  • A user with the Organization Administrator role requires the Metrics Reader role to view topology metrics. The Organization Administrator role should be sufficient.

June 1, 2021 (beta)

StreamSets is happy to announce the beta release of StreamSets DataOps Platform, a cloud-native platform that empowers data engineers to build data pipelines.

StreamSets DataOps Platform includes the following components that seamlessly work together to manage your pipelines - Control Hub, Data Collector, and Transformer. The Control Hub component provides a common UI for the following types of users:

  • Platform administrators use Control Hub to deploy and launch Data Collector and Transformer engines on-premises or in cloud environments. The engines are automatically tethered to Control Hub.

  • Data engineers use Control Hub to build, run and monitor pipelines across the deployed engines.

The beta release is available for development and testing, but is not meant for production use.

Known Issues

Please note the following known issues:
  • When SCIM provisioning is enabled and Azure AD synchronizes users with expired Control Hub invitations, these users cannot log into Control Hub, even though the users have been correctly provisioned.

    Workaround: In Control Hub, disable SCIM provisioning, renew the invites for all users with expired invitations, and then enable SCIM provisioning again. Instruct the affected users to log in before the new invitations expire. Otherwise, you must repeat this workaround.

  • When SCIM provisioning is enabled and a group is deleted in Azure AD, Control Hub might randomly encounter a 500 internal server error when Azure AD synchronizes the group update with StreamSets. In most cases, Azure AD continues to retry the group update until it correctly deletes the group.
  • When configuring a GCE deployment, the Instance Service Account property displays a maximum of 20 service accounts. If your GCP project includes more than 20 service accounts, the service account created as a Control Hub environment prerequisite might not display in the list.

    Workaround: Use the StreamSets DataOps Platform SDK for Python to set the service account for the GCE deployment.

  • When configuring a GCE deployment, you are required to select a service account even if the GCP environment has a default service account defined. Similarly, when configuring an Azure VM deployment, you are required to select a managed identity and resource group, even if the Azure environment has defaults defined for those objects.

    Workaround: Reselect the values that were defined in the parent environment as defaults, or use the StreamSets DataOps Platform SDK for Python to create the deployments.