Release Notes
May 2023
The following StreamSets DataOps Platform releases occurred in May 2023.
May 12, 2023
This release includes several enhancements and a fixed issue.
Enhancements
- Fragment icons
- By default when you add a published fragment to a pipeline, the fragment
displays as a single stage with a puzzle piece icon:
. You can modify the icon to represent the fragment processing logic. For example, if a fragment merges two streams of data, you might configure the fragment to use the predefined Merge icon:
.
- Pipeline canvas toolbar
- The new pipeline canvas
UI displays the following icons in different colors, making it
easier to locate these icons in the toolbar:
- The Stop and Force Stop icons display in red:
.
- The Check In icon displays in red (
) when the pipeline or fragment has not passed implicit validation and cannot be checked in. The Check In icon displays in green (
) when the pipeline or fragment has passed implicit validation and is ready to be checked in.
- The Stop and Force Stop icons display in red:
- Fragment export
- You can export and import draft fragments.
- GCE deployments
- You can optionally configure a GCE deployment to provision Google Cloud VM instances without external IP addresses.
Fixed Issue
- If the last page of results for a job search previously returned items, but no longer does, no items are shown on the page. Users must navigate to the previous page to see items.
April 2023
The following StreamSets DataOps Platform releases occurred in April 2023.
April 23, 2023
This release includes a fixed issue.
Fixed Issue
- When a deployment uses an external resource archive and the
externalResources
folder contains an emptyuser-libs
folder, engines belonging to the deployment fail to start with the following error message:Exception in thread "main" java.lang.IllegalArgumentException: Stage libraries directory '/<installation directory>/externalResources/user-libs' does not exist
April 21, 2023
This release includes several enhancements and fixed issues.
Enhancements
- First login to StreamSets
- When you log in to StreamSets DataOps Platform for the first time and you do not have access to an engine deployed by another user, a blank Data Collector pipeline opens in the canvas. You can immediately begin building the pipeline using all available stages.
- Pipeline canvas
- When using the new
pipeline canvas UI and you select a link connecting two stages,
Control Hub displays a pop-up menu that includes the following icons:
- Insert Stage (
) - Inserts another stage between the connected stages.
- Delete (
) - Deletes the selected link.
- Insert Stage (
- Job templates
- On the Job Templates view, the Start Jobs menu option has been renamed to Create Instances to more clearly indicate that you are creating and starting job instances from a job template.
Fixed Issues
- Users and engines randomly encounter 401 authorization errors from Control Hub.
- Editing an existing job template fails with the following error:
'len' parameter must be a valid integer equal to -1 or greater than zero. len = '0' : RESTAPI_05
- Searching for a pipeline by ID fails with the following
error:
"RSQL_00:Bean 'RPipeline' property 'id' does not have a mirrored property in 'PPipeline'"
- The pipeline canvas incorrectly enables the Draft Run menu even though the selected authoring engine is not accessible.
- When you modify or delete a parameter from a fragment being used in a pipeline, publish the fragment, and then update the pipeline to use the latest fragment version, the parameter changes are not reflected in the pipeline.
- If you use advanced mode to edit the
dnsPolicy
attribute in a Control Hub Kubernetes YAML file, your modifications do not take effect. The deployment always uses the Default DNS policy. - If you specify a service account name for a Control Hub Kubernetes deployment, start the deployment, and then edit the active
deployment to delete the service account name, the change does not take effect
and the previous service account remains associated with the Kubernetes deployment.
March 2023
The following StreamSets DataOps Platform release occurred in March 2023.
March 29, 2023
This release includes several enhancements and fixed issues.
Enhancements
- Search
- Search includes the following enhancements:
- Control Hub includes additional preset searches for fragments, pipelines, job templates, job instances, and draft runs that are starred by default.
- When searching for pipelines or fragments, you can use the Latest Version property to restrict searches to the latest version of each pipeline or fragment.
- Pipeline and fragment preview
- Previewing
pipelines and fragments includes the following enhancements:
- When you preview a pipeline or fragment, Control Hub uses the default preview configuration and no longer displays the
Preview Configuration dialog box. While running preview, you can
click the Preview Configuration icon (
) to change the configuration and then run the preview again.
- When you preview a Transformer or Transformer for Snowflake pipeline or fragment, the data displays in table view by
default.
When you preview a Data Collector pipeline or fragment, the data continues to display in list view by default.
- Table view displays only output data by default. You can optionally choose to display both input and output data.
- Table view no longer displays colors for different types of data and changed data. List view continues to display colors.
- While running preview, you can click the
Expand icon (
) to quickly expand the preview panel to view more preview data.
- When you preview a pipeline or fragment, Control Hub uses the default preview configuration and no longer displays the
Preview Configuration dialog box. While running preview, you can
click the Preview Configuration icon (
- Pipeline validation
- If pipeline validation fails due to a timeout error, you can increase the validation timeout value.
- Job templates
- When you create job instances from a job template, you can specify whether the job instances inherit the permissions assigned to the parent job template.
- Pipeline export
- You can export and import draft pipelines.
- Customize table columns
- You can resize, hide, and reorder the table columns displayed in all views.
- Azure VM deployments
- The tracking URL for an active Azure VM deployment opens the overview page of the Azure deployment created for your StreamSets deployment. The deployment overview page enables you to more quickly access information about the Azure resources provisioned for the StreamSets deployment.
Fixed Issues
- Basic searches that use the
not includes
operator or advanced searches that use the=out=
operator incorrectly return results that do include the specified values. - If you view the run history of a job and the pipeline version for one of the runs has been deleted, then the run history displays an error.
- When a pipeline fails over to another Data Collector, the input and output records that display in the Summary tab and in the run summary for a selected job run accessed from the History tab do not include the records for all Data Collectors.
- Control Hub allows you to set the CPU Threshold Percentage for a Control Hub Kubernetes deployment to a value between 0 and 100, but 0 is not a valid value in the Kubernetes cluster.
February 2023
The following StreamSets DataOps Platform releases occurred in February 2023.
February 10, 2023
This release includes several new features, enhancements, behavior changes, and a fixed issue.
New Features and Enhancements
- Kubernetes integration
-
Control Hub provides an integration with Kubernetes. When you use Control Hub Kubernetes environments and deployments, you launch a StreamSets Kubernetes agent that runs in your Kubernetes cluster. The agent communicates with Control Hub to provision the Kubernetes resources needed to run StreamSets engines and to deploy engine instances to those resources.
- Search
- Search
is no longer a Technology Preview functionality and is now enabled by
default. Search replaces filtering and is implemented for the following
object types:
- Fragments
- Pipelines
- Job templates
- Job instances
- Draft runs
- Pipeline canvas
- By default, Control Hub now displays the new pipeline canvas UI.
- Customize table columns
- You can resize, hide, and reorder the table columns displayed in the Job Templates, Job Instances, and Draft Runs views.
Behavior Changes
- Saved searches
- With this release, Control Hub stores your saved searches in the backend. Previously, Control Hub stored saved searches in your current browser. Saved searches did not apply if you logged in using another browser, and saved searches were removed if you cleared the browser cache.
- Job instances with no job runs
- With this release, Control Hub automatically deletes inactive job instances older than 365 days that have never been run.
Fixed Issue
- When a pipeline fails over to another Data Collector, the input and output records that display in the job history do not include the records for the original Data Collector.
February 7, 2023
This release includes a fixed issue.
Fixed Issue
- When SCIM provisioning is enabled and Azure AD synchronizes users with expired Control Hub invitations, these users cannot log into Control Hub, even though the users have been correctly provisioned.
January 2023
The following StreamSets DataOps Platform release occurred in January 2023.
January 20, 2023
This release includes several enhancements and fixed issues.
Enhancements
- View and edit connection details from a published pipeline
- When viewing a published pipeline that includes a stage using a connection, you can click the Edit Connection icon in the stage properties to view and edit the connection details.
- Customize table columns
- You can resize, hide, and reorder the table columns displayed in the Connections view.
- Job run history for job instances started from a job template
- Control Hub now includes job instances started from a job template in the total count of job runs for the organization. As a result, Control Hub purges the run history for job instances started from a job template in the same way that it purges the run history for job and draft runs.
- Search
- The Technology Preview search functionality includes the following
enhancements:
- You can search for attached job instances created from job templates.
- You can search for job instances that include a pipeline with a specified pipeline status.
- You can search for job templates that have been archived.
- In basic search, you can use auto-completion for the following search properties:
Label
property for pipelines and fragmentsTag
andEngine Label
properties for job instances and job templates
As you begin typing a value to search for, Control Hub displays a drop-down menu that lists values that match the entered characters.
Fixed Issues
- Preview does not work when the Run Preview Through Stage property is set to a fragment.
- The Engines view does not correctly sort the list of engines by the Memory Used column.
- When using the new pipeline canvas UI while monitoring a job, the title of the job incorrectly displays the pipeline name instead of the job name.
- If you create and publish a pipeline fragment using pipeline stages and you quickly enter a fragment prefix in the last step of the wizard, the newly created fragment is not added to the pipeline.
- When using the Technology Preview search functionality and you sort pipeline search results by the Last Modified By column, the pipelines are incorrectly sorted by name.
2022
The following StreamSets DataOps Platform releases occurred in 2022.
December 2022
The following StreamSets DataOps Platform release occurred in December 2022.
December 16, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Pipeline fragments
- When you create a pipeline fragment using pipeline stages and you choose to publish the new fragment, Control Hub displays the original pipeline in the canvas, automatically replacing the individual stages with the newly published fragment.
- Draft runs
- While monitoring an engine in the Engines view, you can stop all draft runs currently running on that engine.
- Search
- The Technology Preview search functionality includes the following
enhancements:
- You can perform both basic and advanced searches to find specific draft runs.
- You can search for job instances by the job status color.
- Subscriptions
- You can configure a webhook action for a subscription to use one of the
following additional authentication types to connect to the receiving system:
- API Key
- Bearer Token
- OAuth 2.0
Fixed Issues
- The Control Hub REST API incorrectly returns an HTTP 400 status code instead of a 404 status code when the specified environment or deployment does not exist.
- When using the Technology Preview search functionality to find job templates or job instances last modified by a user, the search results incorrectly display the job created by that user.
- When the Legacy Kubernetes integration is enabled, users cannot create a pipeline when they are assigned the Deployment Manager role but not the Provisioning Operator role or the Organization Administrator role.
November 2022
The following StreamSets DataOps Platform release occurred in November 2022.
November 18, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Deployments
- In the Deployments view, you can filter the list of displayed deployments by engine type.
- Pipeline design
- When configuring the conditions for a Stream Selector processor in a Data Collector pipeline, you can reorder the output conditions.
- Search
- The Technology Preview search
functionality includes the following enhancements:
- You can search by user email address to find all pipelines or fragments committed or last modified by the user or to find all job templates or job instances created or last modified by the user.
- When using basic mode to search for job templates or job instances,
you can include or exclude the
v
prefix. For example, you can search for bothv1
or1
when defining the Pipeline Version property.Previously in basic mode, you had to include the prefix. For example, you had to search for
v1
.
Fixed Issues
- When you expand or collapse a fragment in the job monitoring UI, Control Hub does not update the metric diagrams to display statistics for individual stages in the expanded fragment or for the single fragment stage.
- When the Enable WebSocket Tunneling for UI Communication property is enabled for an organization, all users should be able to configure the Browser to Engine Communication type; however, only organization administrators can do so.
- When using the Technology Preview search functionality, searching for job instances with an INACTIVE job status excludes job instances that have never run.
- When using the Technology Preview search functionality and you sort the search results by job instance status, job instances that have never run are shown first when sorting by ascending status, and last when sorting by descending status.
- When you add a fragment that uses parameters to a pipeline and define no parameter name prefix, Control Hub silently fails to add the fragment to the pipeline.
- When you configure an Azure VM deployment, Control Hub does not display existing SSH key pairs in the Key Pair Name property because
the documentation to create an Azure AD custom role for the parent environment
does not include the following required permission:
Microsoft.Compute/sshPublicKeys/read
Important: StreamSets strongly recommends updating the custom role for all existing Azure environments, as described in the Azure environment prerequisites.
October 2022
The following StreamSets DataOps Platform releases occurred in October 2022.
October 28, 2022
This release includes a new feature.
New Feature
- Provisioning of user accounts from Azure AD
- After enabling SAML authentication using Azure AD, you can configure the provisioning of user accounts from Azure AD to StreamSets. When a user or group is created, updated, or deleted in Azure AD, the same changes are automatically made in StreamSets.
October 26, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Supported browsers
- StreamSets Control Hub now supports the latest version of Microsoft Edge as a browser.
- Search
- The Technology Preview search functionality includes the following
enhancements:
- For job instance and job template search, the Pipeline Commit Label property has been renamed to Pipeline Version to more clearly indicate that the property specifies the version of the pipeline included in a job instance or job template.
- Basic search for job templates no longer displays the Status property because job templates do not have a status.
- Proxy server configuration for engines
-
When you set up a deployment that configures a Data Collector or Transformer engine to use a proxy server for outbound network requests, you can now include special characters in the proxy user and password values, except for an exclamation point (
!
), a backward slash (\
), a leading number sign (#
), or a leading or trailing space.
Fixed Issues
- The Test Connection button in the connection wizard incorrectly requires a user to have the Deployment Manager role.
- Creating a pipeline incorrectly requires a user to have the Deployment Manager role.
- The Fields to Convert property in the Field Type Converter processor does not allow an expression that includes a comma.
- If you configure an engine to use a proxy server and you include a space, single quotation mark, or double quotation mark in the defined proxy properties, then either the engine installation script fails or the proxy authentication fails.
- Topology data SLAs do not trigger alerts because they do not correctly retrieve data.
October 12, 2022
This release includes a fixed issue.
Fixed Issue
- When you delete multiple users at the same time, a user synchronization error message displays even though the users are successfully deleted.
October 6, 2022
This release includes a fixed issue.
Fixed Issue
- When SAML authentication is enabled, only users with the Organization Administrator role can log in.
October 5, 2022
This release includes new features, enhancements, a behavior change, and fixed issues.
New Features and Enhancements
- Search
- You can perform both basic and advanced searches to find specific pipelines, fragments, job instances, or job templates. With basic search, you define search conditions by selecting the object properties, operators, and values that you want to search for.
- Copy field value in preview
- You can quickly copy a field value from pipeline preview using the Copy
Field Value to Clipboard icon:
.
- Pipeline version history
- When using the new pipeline canvas UI, the pipeline or fragment version history that opens from the canvas displays all information on the initial panel. You do not need to expand the version history to manage the versions including viewing commit messages, comparing versions, creating tags for versions, and deleting versions.
- Import pipelines and fragments without connections
- When you import pipelines or fragments that use connections and those connections do not exist in the target organization, you can choose to import the objects without connections. After the import, you must edit the pipelines or fragments to define the connections.
- Subscription parameters
- You can use the ERROR_MESSAGE parameter for a subscription triggered by a maximum global failover retries exhausted event.
- Engine resource thresholds
- The default value of the Max Memory threshold for an engine is now 100%. Previously, the default value was 80%. Existing engines retain the currently configured threshold.
Behavior Change
With this release, the sample IAM policy for credentials provided by a Control Hub AWS environment includes the following additional permission:
autoscaling:DescribeWarmPool
This permission is required when you update the number of engine instances for an active Amazon EC2 deployment belonging to an AWS environment.
If the IAM policy does not include this permission and you update the number of engine instances, the deployment transitions to an Activation Error state. When you access the tracking URL to the AWS Management Console, the Events tab displays the following error:
API: autoscaling:DescribeWarmPool User: arn:aws:sts::${ACCOUNT_ID}:assumed-role/${CROSS_ACCOUNT_ROLE}/STREAMSETS_SCH is not authorized to perform: autoscaling:DescribeWarmPool because no identity-based policy allows the autoscaling:DescribeWarmPool action
StreamSets strongly recommends updating the IAM policy for credentials for all existing StreamSets AWS environments to avoid this deployment error.
...
{
"Sid": "0",
"Effect": "Allow",
"Action": [
"ec2:DescribeImages",
"autoscaling:DescribeScalingActivities",
"ec2:DescribeVpcs",
"autoscaling:DescribeAutoScalingGroups",
"ec2:DescribeRegions",
"autoscaling:DescribeLaunchConfigurations",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceTypeOfferings",
"ec2:DescribeSubnets",
"ec2:DescribeKeyPairs",
"ec2:DescribeSecurityGroups",
"ec2:DescribeInstances",
"autoscaling:DescribeScheduledActions”,
"autoscaling:DescribeWarmPool"
],
"Resource": "*"
},
...
You do not need to restart active StreamSets AWS environments or Amazon EC2 deployments after updating the policy. However, if an existing Amazon EC2 deployment has transitioned to an Activation Error state due to this error, then you must stop that deployment and start it again.
For more information about the required IAM policy, see Configure AWS Credentials.
Fixed Issues
- The Topologies Dashboard can display an incorrect number of engines.
- Pipelines that contain fragments with names starting with a number do not correctly display in the new pipeline canvas UI and encounter unexpected errors in the classic pipeline UI.
- Stages that were originally in a fragment could become uneditable.
August 2022
The following StreamSets DataOps Platform release occurred in August 2022.
August 26, 2022
This release includes several enhancements, a behavior change, and fixed issues.
Enhancements
- Scroll zoom in pipeline canvas
- You can now zoom in or out of the pipeline canvas using the mouse scroll wheel or using the trackpad. By default, scroll zoom is enabled. You can disable scroll zoom if needed.
- Hide or show contextual help in wizards
- When you create or edit an object, such as an environment, deployment, pipeline, or job, you can now hide or show the contextual help that displays on the right of the wizard.
Behavior Change
- Filters for job templates, job instances, or draft runs
- With this release, when you select the Keep Filter Persistent checkbox to retain the filter on the Job Templates, Job Instances, or Draft Runs view, Control Hub retains a unique filter for each view. Previously, Control Hub incorrectly used the same filter for all of these views.
Fixed Issues
- In rare cases when editing a deployment, the External Resource Source property can be set to null which causes a null pointer exception.
- The TRIGGERED_COUNT and TRIGGERED_ON subscription parameters do not contain the correct values.
- The Windowing Aggregator processor does not display aggregation charts.
- You can delete a job associated with a running scheduled task, resulting in a scheduled task that attempts to trigger actions on a job that doesn’t exist.
- In rare cases, the configured time zone for a scheduled task is ignored, which causes the task to start and finish at undesired times.
- Clicking Upload Offset & Start while viewing job instance details uploads the offset but doesn’t start the job.
- Because the initialization script for an Azure VM deployment does not run before engine instances start, you cannot use the script to update DNS entries. As a result, engines fail to launch when the Azure VNet uses custom DNS servers.
July 2022
The following StreamSets DataOps Platform release occurred in July 2022.
July 20, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Runtime parameters for pipelines
- Some pipeline and stage properties conditionally display child properties. For example, if you configure an origin to use the Delimited data format, the origin displays a set of Delimited configuration properties. If you configure that origin to use the JSON data format, it displays a different set of JSON configuration properties.
- Job monitoring toolbar
- When you monitor a job, the canvas includes a new toolbar that more clearly indicates the action that each toolbar icon completes.
- Starting multiple scheduled jobs at the same time
- When you start multiple jobs at the exact same time using the scheduler, the number of pipelines running on an engine can exceed the Max Running Pipeline Count configured for the engine.
Fixed Issues
- When sharing an object, you cannot search for user and group names that include the search string, only names that start with the search string.
- If you rename a job, scheduled tasks for that job do not update the job name.
- When you delete users or groups and then create users or groups with the same ID, permissions might behave unexpectedly.
June 2022
The following StreamSets DataOps Platform releases occurred in June 2022.
June 17, 2022
This release includes an enhancement and several fixed issues.
Enhancement
- New stages centered in the canvas
- When you use the stage library panel to add a new stage to the pipeline canvas, the stage appears in the center of the current view of the canvas. Previously, it appeared to the right of the right-most stage in the pipeline.
Fixed Issues
- When a deployment contains multiple engine instances and you try to share the deployment with another user or group for a second time, Control Hub fails to update the permissions due to a 500 internal server error.
- A single faulty subscription prevents other suitable subscriptions from triggering.
- You cannot access a draft run of a pipeline when you view the list of running pipelines on an engine from the Engines view.
June 3, 2022
This release includes an enhancement and several fixed issues.
Enhancement
- View and edit connection details from stage properties
- When configuring a pipeline or fragment stage that uses a connection, you can click the Edit Connection icon in the stage properties to view and edit the connection details.
Fixed Issues
- The Available Pipeline Runners histogram does not display when you monitor a job for a Data Collector multithreaded pipeline.
- The pipeline canvas cannot handle more than 100 instances of the same stage.
- Provisioning Agents are incorrectly included in the system limit for engines.
May 2022
The following StreamSets DataOps Platform releases occurred in May 2022.
May 20, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Compare pipeline and fragment versions
- When using the new pipeline
canvas UI to compare two versions of a pipeline or pipeline
fragment, you can click the Open in Canvas icon (
) to open one of the versions in the pipeline canvas.
- Engine type icon in Fragments, Pipelines, and Sample Pipelines views
- The Fragments view, Pipelines view, and Sample Pipelines view include an engine type icon next to each pipeline or fragment name so that you can quickly determine the pipeline or fragment type.
Fixed Issues
- Credential fields in a pipeline state notification webhook do not properly show the value of entered passwords when authentication is used.
- On Windows 10, some file archive programs cannot extract an exported Control Hub ZIP file because of the colon (:) character in the JSON file names.
May 6, 2022
This release includes several new features, enhancements, and fixed issues.
New Features and Enhancements
- Draft runs
- Pipeline test runs have been renamed to draft runs to more clearly indicate that a draft run is the execution of a draft pipeline.
- Stage selector in the pipeline canvas
- The new pipeline canvas UI includes an enhanced stage selector with a horizontal layout, making it easier to filter the stages by type.
Fixed Issues
- When authorizing Control Hub API calls, Control Hub does not consider the roles assigned by a user’s group.
- When you change the owner of a scheduled task, the previous owner is incorrectly listed as the user that executes the task.
April 2022
The following StreamSets DataOps Platform releases occurred in April 2022.
April 27, 2022
This release includes a fixed issue.
Fixed Issue
- You cannot create new users using the Control Hub REST API.
April 22, 2022
This release includes several new features and enhancements.
New Features and Enhancements
- Init scripts for cloud service provider deployments
- You can define an initialization script for cloud service provider deployments, such as Amazon EC2, Azure VM, and GCE deployments. Control Hub runs the init script while provisioning a new instance in your cloud account.
- Pipeline canvas toolbar
- The pipeline canvas includes a new toolbar that more clearly indicates the action that each toolbar icon completes. By default, the pipeline canvas continues to display the classic toolbar.
- Job status
- Jobs that have not run are listed with an Inactive status. Previously, jobs that had not run were listed without a status.
- Editable group ID
- When you create a group, you define the group display name and can optionally edit the group ID. By default, Control Hub generates the group ID from the display name, using all lowercase characters and replacing any spaces with underscores. Users type the group ID when using credential functions in pipelines. As a result, you might want to edit the default group ID to make it easier to use with credential functions.
April 13, 2022
This release includes a fixed issue.
Fixed Issue
- You cannot directly log in to the SAML configuration page as an Organization Administrator if you belong to another organization with SAML authentication disabled.
April 8, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Simplified pipeline and fragment wizards
-
The pipeline wizard and the fragment wizard have been simplified and no longer include the redundant Review & Open step.
- Upgrading a pipeline to use a later authoring engine version
- When you select a later authoring engine version for a pipeline, Control Hub now informs you that you are upgrading the pipeline so that it can no longer run on the earlier engine version. If you select a later engine version for a draft pipeline, you are given the choice to publish the draft pipeline first, and then create a new draft pipeline that is upgraded to run on the later engine version. That way, you retain a pipeline version that can run on the earlier engine version.
Fixed Issues
- When you disable a deployment and then access a pipeline using an authoring engine from that deployment, you are logged out.
- Export All Published Pipelines exports only some published pipelines.
- Pressing Ctrl+Z doesn't undo text changes after you edit text in the Control Hub UI.
- Exporting jobs can generate a corrupt ZIP file.
March 2022
The following StreamSets DataOps Platform release occurred in March 2022.
March 18, 2022
This release includes a new feature and several fixed issues.
New Feature
- Transformer for Snowflake available in public preview
- New and existing organizations can create and run Transformer for Snowflake pipelines.
Fixed Issues
- When you stop an Azure VM deployment, the Azure key vaults created to store
secrets for the provisioned VM instances are not deleted from your Azure
account.
If you added Azure tags to Azure VM deployments that were stopped before this fix, you can use the tags to find the key vaults and then delete them from your Azure account.
- Subscriptions do not trigger when permission enforcement is disabled for the organization.
February 2022
The following StreamSets DataOps Platform releases occurred in February 2022.
February 25, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Install stage libraries while creating connections
- When using an authoring Data Collector 4.4.x or later, you can select from all possible connection types while creating a connection, even if the corresponding stage library is not installed on the authoring Data Collector. If you select an uninstalled connection type, you can choose to update the deployment with the missing stage library that includes the connection type.
- SDC_ID subscription parameter displays as Engine ID in the user interface
-
When you create a subscription and define a simple condition for an execution engine not responding event, the SDC_ID parameter now displays as
Engine ID
in the UI, instead ofSDC_ID
. TheEngine ID
label indicates that the parameter can include the ID of any engine type.Continue to use SDC_ID as the parameter name when you use the parameter to represent StreamSets engines.
- Scheduled tasks
- While creating a scheduled task, you can search for the job to schedule.
Fixed Issues
- Exporting multiple pipelines can generate a corrupt ZIP file.
- You cannot download a snapshot captured for a pipeline test run or job run.
- When you edit a subscription condition in the UI, the changes are not saved.
- The job History tab inconsistently displays Data Collector URLs.
February 23, 2022
This release includes an enhancement and a fixed issue.
Enhancement
- Supported SAML identity providers
- StreamSets supports PingFederate as a SAML identity provider.
Fixed Issue
- Control Hub allows you to test a SAML draft configuration when SP-initiated logins
are disabled, even though testing SAML from Control Hub is only supported when SP-initiated logins are enabled. Testing when
SP-initiated logins are disabled results in a
404 Not Found
error.
February 11, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Create fragments using pipeline stages
- When you create a fragment from selected stages in a pipeline, you can choose to immediately publish the fragment as long as the fragment meets the validation requirements to be published. Or, you can choose to create a draft fragment and then continue designing the pipeline.
- GCE deployments support selecting a subnet
- GCP environments support VPC networks that have an auto or custom subnet creation mode. As a result, while creating a GCE deployment, you must select the subnet within the VPC network to provision the Compute Engine instances in.
- Contextual help sidebar
- After selecting a view in the Navigation panel, click the
Help icon (
) above the listed objects to open the contextual help sidebar. To close the contextual help, click outside of the sidebar. The contextual help sidebar has been implemented for most views. It will be implemented for additional views in future releases.
- Contextual help in environment, deployment, and pipeline wizards
- When you create or edit an environment, deployment, or pipeline, the wizard displays contextual help on the right. The help contents depend on the step being completed in the wizard. Use the scroll bar to view the complete help contents for each step.
Fixed Issues
- Restarting multiple engines at the same time causes an error.
- When you select a later version of an authoring Data Collector while a pipeline is in read-only mode, stage libraries are not updated in the pipeline.
- The Global Failover Retries property does not display when creating or editing a Transformer job.
- GCP environments support only VPC networks that have automatic subnet creation enabled.
- GCP environments do not support using a shared VPC network from a host GCP project.
February 4, 2022
This release includes an enhancement and fixed issue.
Enhancement
- Encrypting SAML assertions
- When you disable SP-initiated logins for SAML authentication, you can optionally configure the IdP to encrypt the SAML assertion. If you choose not to configure the IdP to encrypt SAML assertions, then you must disable the Require Encryption on Assertion property in the draft SAML configuration for your organization.
Fixed Issue
- If you belong to multiple organizations that have SAML authentication
enabled and you sign in to StreamSets using SSO SAML, you cannot choose which organization to log into.
January 2022
The following StreamSets DataOps Platform releases occurred in January 2022.
January 31, 2022
This release includes several enhancements and fixed issues.
Enhancements
- Create connections in the pipeline canvas
- While building a pipeline, you can create a new connection for a stage without leaving the pipeline canvas.
- Connection details
- The Connection details page has been enhanced to make the available actions more visible and to more clearly display details about all pipeline and fragment versions using the connection.
- Import existing pipelines or fragments as new
- When you import a pipeline or fragment that already exists in the target organization, you can choose to import the object as a new pipeline or fragment.
- Contextual help in the pipeline canvas
- While building a pipeline, click the Help tab in the pipeline properties panel to access common getting started, pipeline usage, troubleshooting, and best practice help topics and StreamSets Learning Academy videos. You can also search for topics in the StreamSets documentation.
Fixed Issues
- When viewing Transformer engines in the Engines view, clicking the More icon and then clicking Transformer Components displays the Getting Started page.
- Pipeline export fails when a pipeline contains multiple stages of the same type that use different library versions.
- You cannot download an engine support bundle.
January 21, 2022
This release includes a behavior change.
Behavior Change
- Transformer version for new deployments
- Starting with Transformer version 4.2.0 released on January 21, 2022, you can create a new deployment only for Transformer 4.2.0 or later. Earlier Transformer versions are not supported in new deployments.
January 14, 2022
This release includes several enhancements, behavior changes, and fixed issues.
Enhancements
- Duplicate connections
- You can duplicate a connection to create a copy of an existing connection. You can then change the configuration of the copy.
- Share pipeline or fragment while publishing
- While using the Check In wizard to publish a pipeline or publish a fragment, you can share the pipeline or fragment with other users and groups.
Behavior Changes
- Data Collector version for new deployments
- Starting with Data Collector version 4.3.0 released on January 13, 2022, you can create a new deployment only for Data Collector 4.3.0 or later. Earlier Data Collector versions are not supported in new deployments.
Fixed Issues
- A running job becomes corrupted if you import another version of the job that uses a newer pipeline version.
- A subscription for a job status change event with a JOB_OWNER condition set to a specific email address fails to trigger because the JOB_OWNER parameter is always null.
2021
The following StreamSets DataOps Platform releases occurred in 2021.
December 2021
The following StreamSets DataOps Platform releases occurred in December 2021.
December 20, 2021
This release includes several new features, enhancements, and fixed issues.
New Features and Enhancements
- Job templates
- Job
templates include the following enhancements:
- Job templates display on the Job Templates view, separate from the list of job instances that display on the Job Instances view.
- When you create job instances from a job template, the instances
are attached to the parent job template by default. Attached job
instances display in the parent job template’s run history and
are updated when the parent job template is updated.
You can optionally detach job instances from the parent job template when you want to use the job details and default parameter values defined in the template, but don't want subsequent changes to the template to be applied to the job instance.
- When you create a job template for a pipeline that uses runtime parameters, you define whether each parameter functions as a dynamic parameter that can be overridden in child job instances or as a static parameter that cannot be overridden.
- You can archive a job template when you do not want new job instances to be created from the template, but want existing job instances to continue to run.
- Job instances created from a job template inherit all tags added to the template.
- Stage library selection for deployments
- When you define the engine configuration for a deployment, you can select individual stages to install or uninstall, in addition to selecting stage libraries.
- Engine installation script for self-managed deployments
- When using a self-managed deployment, you can configure the engine installation script to run the engine as a foreground or background process.
- Authoring engine selection for pipelines, fragments, and connections
- When you create pipelines, fragments, or connections, the authoring engine selection window displays the current CPU and memory usage of each engine.
- Sticky notes for pipeline canvas
- To delete a sticky
note from the pipeline canvas, you click the Delete icon
(
) instead of the X in the note header.
Fixed Issues
- When legacy Kubernetes is enabled for your organization and you have created
legacy deployments only, you receive the following message when trying to
create a pipeline:
Attention: You need to set up a deployment with an engine before you can create a pipeline
- If you have an existing Data Collector deployment with an installed enterprise stage library, and you edit the
deployment to upgrade to a new version of the same enterprise stage library,
you cannot remove the existing version of the stage library. As a result,
Data Collector fails to start with the following error:
REST_1001 - Unable to find following stage libraries in repository list: streamsets-datacollector-<enterprise library type>-lib:<new version>, streamsets-datacollector-<enterprise library type>-lib:<existing version>
- If existing Control Hub users have an Invited or Expired status and then you enable and disable
SAML authentication, disabling SAML fails with the following
error:
javax.persistence.PersistenceException: org.hibernate.exception.ConstraintViolationException: could not execute statement
December 8, 2021
This release includes an enhancement.
Enhancement
- Supported SAML identity providers
- StreamSets supports Microsoft Azure Active Directory (Azure AD) as a SAML identity provider.
November 2021
The following StreamSets DataOps Platform releases occurred in November 2021.
November 17, 2021
This release includes a new feature.
New Feature
- SAML authentication
- StreamSets DataOps Platform supports single sign-on (SSO) authentication with SAML 2.0 with selected identity providers (IdPs).
November 5, 2021
This release includes a new feature and several fixed issues.
New Feature
- Legacy Kubernetes integration
- Control Hub provides a legacy Kubernetes integration that you can use to automatically provision Data Collectors on Kubernetes. Provisioning includes deploying, registering, starting, scaling, and stopping Data Collector Docker containers in a Kubernetes cluster. Legacy Kubernetes integration requires that the Provisioning Agent use the Control Agent Docker image version 4.0.0 or later.
Fixed Issues
- When viewing real-time statistics and metrics for an active Transformer job, the names of stages display incorrectly.
- Jobs can transition to an
inactive_error
status with the following error because multiple Job Runner applications attempt to process the same job:JOBRUNNER_69 At least one of the execution engines <engine ID> didn’t respond to the stop command
October 2021
The following StreamSets DataOps Platform releases occurred in October 2021.
October 22, 2021
This release includes several new features, enhancements, and fixed issues.
New Features and Enhancements
- Microsoft Azure integration
- Control Hub provides an integration with your Microsoft Azure account. When you use Azure environments and Azure VM deployments, Control Hub automatically provisions Azure VM instances needed to run StreamSets engines in your Azure account, and then deploys engine instances to those VM instances.
- JVM memory strategy for deployed engines
- By default, engines are now configured to use 50 percent of the available memory on the host machine as the Java heap size. Previously, engines were configured to use an absolute value, 1024 MB by default.
- Quick Start
- When you create self-managed deployments from the Quick
Start menu, Control Hub assigns the deployments a
quick-start
tag and names the deployments as follows, based on the selected engine type:Docker Data Collector <number> (Quick Start)
Docker Transformer <number> (Quick Start)
- Job error acknowledgement
- By default when a job encounters an inactive error status, users must acknowledge the error message before the job can be restarted. You can configure a job to skip job error acknowledgements. You might want to skip job error acknowledgement for scheduled jobs so the job can automatically be restarted without requiring user intervention. However, be aware that skipping job error acknowledgement might hide errors that the job has encountered.
- Upgrade to a paid subscription
- If you have a Free account, you can upgrade to a paid subscription from the My Account menu.
Fixed Issues
- The pipeline canvas incorrectly allows adding a fragment to another fragment although nested fragments are not supported.
- After reloading a pipeline, parameters called from properties that display as checkboxes are not displayed.
- Control Hub does not allow the at sign (@) in the property name for any engine advanced configuration properties.
September 2021
The following StreamSets DataOps Platform releases occurred in September 2021.
September 28, 2021
This release includes a fixed issue.
Fixed Issue
- Pipeline fragment creation is failing.
September 15, 2021
This release includes several enhancements.
Enhancements
- In-application chat
- You can use the in-application chat to start a conversation with the StreamSets team. You can enable or disable the chat in your account settings accessible from the My Account window.
- Sticky notes for pipeline canvas
- You can add sticky notes to the pipeline canvas to include notes that might be useful to you or your team as you build a pipeline or pipeline fragment. For example, you might add a note as a reminder to revisit a portion of the pipeline design or to change a stage after running initial tests of the pipeline.
- Updated usage limits
- The usage limits for organizations with a Free account have been
updated as follows:
- 2 users - For an existing Free account with more than 2 users, all users can continue to log in as a member of the organization. However, you cannot invite additional users to the organization.
- 10 published pipelines - For an existing Free account with more than 10 published pipelines, you can continue to use all published pipelines.
August 2021
The following StreamSets DataOps Platform releases occurred in August 2021.
August 11, 2021
This release includes several enhancements.
Enhancements
- My Account
- All properties previously available in the User Settings window have been moved to the My Account window. As a result, you can manage all your account settings from a single window. You can also modify your display name in the My Account window.
- Confirmation dialog boxes
- The OK button in several confirmation dialog boxes has been renamed to reflect the action that you are taking. For example, when deleting a connection, the OK button has been renamed to Delete.
- Roles and permissions required to delete an engine
- A user must have the following roles and permissions to delete an
engine:
Task Roles Permissions Delete an engine. Engine Administrator Deployment Manager
Write on engine
July 2021
The following StreamSets DataOps Platform releases occurred in July 2021.
July 30, 2021
This release includes several new features, enhancements, behavior changes, and fixed issues.
New Features and Enhancements
- Deployment and engine permissions
- With this release, engines always inherit the same permissions assigned to the deployment. Those inherited permissions cannot be modified at the engine level. For example, if you grant a user Execute permission on a deployment, that user also has Execute permission on all engines managed by that deployment.
- Engines
-
- Shut down engines - You can shut down engines belonging to a self-managed deployment from the Engines view. You can also restart or shut down multiple engines at the same time.
- Resource thresholds - The default values of the Max CPU Load and Max Memory thresholds for an engine are now 80%. Previously, the default values were 100%.
- Connections
- When creating or editing a connection, you can filter the list of available authoring Data Collectors by deployment, version, or label. The Data Collector with the most recent reported time is selected by default.
- Pipelines
-
- Pipeline and fragment creation - While creating a new pipeline or pipeline fragment, you can choose to open the pipeline or fragment in the canvas before completing the Share Pipeline or Share Pipeline Fragment step.
- Sample pipelines - When you open a sample pipeline in the pipeline canvas, the Duplicate button has been renamed to Create a pipeline from sample.
- Free and paid accounts
- DataOps Platform now offers paid accounts in addition to free accounts.
- Users
-
- Session timeout - After 30 minutes of inactivity, user sessions expire and users are logged out. An organization administrator can change the inactivity period for the organization.
- Deleting users - When you attempt to delete an active user, Control Hub prompts you to deactivate the user during the deletion process.
Behavior Changes
- Data Collector version for new deployments
- With this release, you can create a new deployment only for Data Collector version 4.0.2 or later. Earlier Data Collector versions are not supported in new deployments.
Fixed Issues
- If you stop multiple deployments at the same time, Control Hub might fail to
completely stop the deployments with the following
error:
An AppException occurred deactivating deployment tokens after stack was removed: Issues: [CSP_046 - Rest Api '<Control Hub URL>/security/rest/v1/organization/<organization ID>/components/delete' failed with status:'500']
-
No errors display in the UI when you restart Docker engine instances for a self-managed deployment with an external resource archive that uses an incorrect file format.
July 14, 2021 (beta)
This release includes several new features, enhancements, and fixed issues.
New Features and Enhancements
- Quick Start
- To get started with StreamSets, click Quick Start in the top toolbar to quickly deploy a Data Collector or Transformer engine using Docker. After the engine is launched and running, click Quick Start > Create a pipeline to quickly create a pipeline.
- Environments
-
- Environment state - The environment state and status have been combined into a single state to simplify the state concept. For example, an environment with an Enabled state and an OK status now has a single Active state.
- Activate and deactivate actions - The Enable and Disable actions for an environment have been renamed to Activate and Deactivate to align the actions with the new environment states.
- Deployments
-
- Deployment state - The deployment state and status have been combined into a single state to simplify the state concept. For example, a deployment with a Disabled state and an OK status now has a single Deactivated state.
- Start and stop actions - The Enable action for a deployment has been renamed to Start to more clearly indicate that after you start a deployment, the deployment is ready to launch engine instances. The Disable action has been renamed to Stop to more clearly indicate that after you stop a deployment, you can no longer launch engine instances for the deployment.
- Deployment details - The Deployment details page has been enhanced to make the available actions more visible and to more clearly display details about the existing engines belonging to the deployment.
- Launched engine status for self-managed deployments - When copying the installation script to launch an engine instance for a self-managed deployment, you can choose to check the engine status in the Control Hub UI. Previously, you could view the engine status only in the command prompt.
- Engine installation type for self-managed deployments - After saving a self-managed deployment, you can change the engine installation type. For example, you can change a Tarball installation type to the Docker image installation type.
Fixed Issues
- The list of EC2 instance types available when creating an Amazon EC2
deployment does not take into account the availability zone used by the
parent AWS environment. If you select an unsupported instance type, AWS
CloudFormation cannot provision the EC2 instances and the following error
message displays:
Your requested instance type (<type>) is not supported in your requested Availability Zone (<zone>)
Important: To fix this issue, StreamSets has updated the sample IAM policy used to configure AWS credentials for an AWS environment. If you encountered this issue with an existing AWS environment and EC2 deployment, then update the IAM policy to include theec2:DescribeInstanceTypeOfferings
permission. If you have not encountered this issue, there is no need to update the IAM policy. - You can delete a pipeline included in a job when you do not have permission on the job, causing the job to become invalid.
June 2021
The following StreamSets DataOps Platform releases occurred in June 2021.
June 23, 2021 (beta)
This release includes several fixed issues.
Fixed Issues
- When the parent AWS environment uses an AWS region other than
us-west-2
, an Amazon EC2 deployment fails to launch a StreamSets engine with the following error:Parameter validation failed: parameter value ami-<AMI_ID> for parameter name AMI does not exist
-
Using the API Credentials view to generate and manage credentials for use with the Control Hub REST API is not supported at this time.
June 18, 2021 (beta)
This release includes several new features and enhancements.
New Features and Enhancements
- Deployments
-
-
Engine version - The version number of a nightly engine build includes the build number in addition to the
-SNAPSHOT
suffix. -
GCE deployments - When configuring the GCE autoscaling group for a deployment, you can enter the network tags to provision the VM instances with.
-
- Jobs
-
-
Failover retries - When you enable a Data Collector or Transformer job for pipeline failover, you can configure the global number of pipeline failover retries to attempt across all available engines. When the limit is reached, Control Hub stops the job.
-
Balance jobs icon - The Engines view includes a new Balance Jobs icon:
.
-
- Subscriptions
-
You can configure a subscription action for a maximum global failover retries exhausted event. For example, you might create a subscription that sends an alert to a Slack channel when a job has exceeded the maximum number of pipeline failover retries across all available engines.
June 11, 2021 (beta)
This release includes enhancements and fixed issues.
Enhancements
- Environments
- The Allow Nightly Engine Builds property for environments has been moved to an advanced option. In most cases, you don't need to modify this property.
- Roles
- The Auth Token Administrator and Engine Guest roles have been removed because they are not applicable for StreamSets DataOps Platform.
Fixed Issues
- If you create a Microsoft account just before signing up as a new StreamSets user with that Microsoft account, the sign in might fail.
-
In rare and randomly occurring scenarios, deleting a user from an organization might only partially succeed. Users in this state are not able to create a new login session, but may still show up in the Control Hub list of users for an organization.
- A user with the Organization Administrator role requires the Metrics Reader role to view topology metrics. The Organization Administrator role should be sufficient.
June 1, 2021 (beta)
StreamSets is happy to announce the beta release of StreamSets DataOps Platform, a cloud-native platform that empowers data engineers to build data pipelines.
StreamSets DataOps Platform includes the following components that seamlessly work together to manage your pipelines - Control Hub, Data Collector, and Transformer. The Control Hub component provides a common UI for the following types of users:
-
Platform administrators use Control Hub to deploy and launch Data Collector and Transformer engines on-premises or in cloud environments. The engines are automatically tethered to Control Hub.
-
Data engineers use Control Hub to build, run and monitor pipelines across the deployed engines.
The beta release is available for development and testing, but is not meant for production use.
Known Issues
- You cannot import a draft pipeline that includes a fragment.
Workaround: If the draft pipeline is valid, check in the pipeline to create a published version, and then export and import the published pipeline.
If the draft pipeline is not valid, open the exported JSON file and locate the following line in the file:"libraryDefinitions": null
Replace that line with the following lines:"libraryDefinitions": { "services": [], "stages": [], "stageIcons": {} }
Then import the edited JSON file.
- When you update an external resource archive to remove a file and then restart all engine instances in the deployment, Control Hub copies the updated archive file contents to the engine instances without first removing the deleted file. As a result, the file deleted from the archive still exists in the engine instances.
- When configuring a GCE deployment, the Instance Service Account property
displays a maximum of 20 service accounts. If your GCP project includes more
than 20 service accounts, the service account created as a Control Hub environment prerequisite might not display in the list.
Workaround: Use the StreamSets DataOps Platform SDK for Python to set the service account for the GCE deployment.
- When configuring a GCE deployment, you are required to select a service account
even if the GCP environment has a default service account defined. Similarly,
when configuring an Azure VM deployment, you are required to select a managed
identity and resource group, even if the Azure environment has defaults defined
for those objects.
Workaround: Reselect the values that were defined in the parent environment as defaults, or use the StreamSets DataOps Platform SDK for Python to create the deployments.
- The StreamSets Kubernetes agent version 1.0.0 does not support configuring
engines to use a proxy server by defining the proxy properties in the Control Hub Kubernetes deployment. Workaround: Complete the following steps to configure engines to use a proxy server:
- Create a YAML file that defines a ConfigMap for the engine proxy
properties, where
<namespace-name>
is the Kubernetes namespace where the engines are deployed.For example:apiVersion: v1 data: http.nonProxyHosts: <pipe-separated no proxy hosts> no_proxy: <comma-separated no proxy hosts> http.proxyHost: <proxy host> http.proxyPassword: <password> http.proxyPort: "<port>" http.proxyUser: <proxy user> http_proxy: http://<proxy user>:<password>@<proxy host>:<port> https.proxyHost: <proxy host> https.proxyPassword: <password> https.proxyPort: "<port>" https.proxyUser: <proxy user> https_proxy: http://<proxy user>:<password>@<proxy host>:<port> kind: ConfigMap metadata: name: <config-map-name> namespace: <namespace-name>
- Run the following command to apply the YAML and create the ConfigMap
in the Kubernetes
cluster:
kubectl -n <namespace-name> apply -f <config-map-name>.yaml
- In Control Hub, edit the Kubernetes deployment. Use advanced mode to edit the
YAML file, adding the following
envFrom
field to the deployment container, after the existing- env
field:envFrom: - configMapRef: name: <config-map-name>
- Create a YAML file that defines a ConfigMap for the engine proxy
properties, where