IBM StreamSets as Client-Managed Software
IBM StreamSets as client-managed software is managed by customers.
You must install IBM Software Hub on a supported cloud deployment environment and then install IBM StreamSets on IBM Software Hub.
After installation, you can administer IBM StreamSets on an ongoing basis.
Installing
- Verifying the
prerequisites
Verify that the prerequisites have been completed before beginning the installation.
- Installing IBM StreamSets on IBM Software Hub
Use the command line to install IBM StreamSets on IBM Software Hub.
- Completing post-installation
tasks
Complete the post-installation tasks before users can begin building streaming data pipelines.
Prerequisites
- IBM Software Hub version 5.1.0 is installed.
For more information about installing IBM Software Hub, see the IBM Software Hub documentation.
- The Red Hat® OpenShift® Container Platform cluster meets
the minimum requirements for installing IBM StreamSets.
For more information about system requirements, see the IBM Software Hub documentation.
- The Red Hat OpenShift Container Platform cluster includes a default storage
class.
For more information about designating a default storage class, see the Red Had OpenShift documentation.
- The workstation from which you run the installation is set up as a client
workstation and includes the following command-line interfaces:
- OpenShift CLI,
oc
. - Kubernetes command-line tool,
kubectl
. The tool must be configured to access your cluster. - Istio command-line tool,
istioctl
. For more information about installing the tool, see the Istioctl documentation.
- OpenShift CLI,
- You have an environment variables script to use with installation
commands.The IBM StreamSets installation commands use the following environment variables so that you can run the commands exactly as written:
${OC_LOGIN}
is an alias for theoc login
command${PROJECT_CPD_INST_OPERATORS}
refers to the operators project${PROJECT_CPD_INST_OPERANDS}
refers to the operands project
If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.
To use the environment variables from the script, you must source the environment variables before you run the installation commands. For example, run:source ./cpd_vars.sh
Installing IBM StreamSets on IBM Software Hub
After completing the prerequisite tasks, a Red Hat® OpenShift® Container Platform cluster administrator uses the command line to install IBM StreamSets on IBM Software Hub.
-
Download and install the
ibm-pak
plug-in.Download the plug-in from the IBM Catalog Management Plug-in for IBM Cloud Paks repository on GitHub. For installation instructions, see the Readme file.
-
Log in to Red Hat OpenShift Container Platform as a cluster
administrator:
${OC_LOGIN}
-
Accept the IBM StreamSets license:
export LICENSE_ACCEPTANCE=true
-
Install the IBM StreamSets operator. Set the
production
argument totrue
if you purchased the production license, or tofalse
if you purchased the non-production license:oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsetsOperator --namespace ${PROJECT_CPD_INST_OPERATORS} --action install --args "--production <false|true>"
-
Verify that the IBM StreamSets operator is running:
kubectl get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep streamsets
-
Set the namespace where you want to install the IBM StreamSets operands:
kubectl config set-context --current --namespace=${PROJECT_CPD_INST_OPERANDS}
-
Install the IBM StreamSets operands, setting the
production
argument based on your license:oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsets --namespace ${PROJECT_CPD_INST_OPERANDS} --action applyCustomResources --args "--production <false|true>"
-
Verify that the IBM StreamSets components are running:
kubectl get pods -l icpdsupport/addOnId=streamsets
Post-installation Tasks
After a cluster administrator installs IBM StreamSets, an IBM Software Hub instance administrator, an IBM StreamSets system administrator, and an IBM StreamSets organization administrator work together to complete the post-installation tasks.
Administrators must complete the post-installation tasks before users can begin building data pipelines.
Verifying User Email Addresses
IBM StreamSets requires that each user account have an email address.
An IBM Software Hub instance administrator must verify that each user account that requires access to IBM StreamSets has an email address.
- Log in to IBM Software Hub.
- From the navigation menu, select .
-
For each user account that requires access to IBM StreamSets:
- Click the user name.
- Click Edit.
- Verify that the user account has an associated email address, and add an address if needed.
- Click Save.
Granting Users Access to the Service
An IBM Software Hub instance administrator must grant users access to the IBM StreamSets service.
Grant access to the following types of IBM StreamSets users:
- System administrator
- The system administrator manages all organizations across IBM StreamSets. Grant this user account the Admin role.
- Organization administrators and users
- Organization administrators and users work within a single IBM StreamSets organization. Grant all organization administrators and users the User role.
- Log in to IBM Software Hub.
- From the navigation menu, select .
- Locate the streamsets instance.
- From the action menu, select Manage access.
-
Add users and user groups.
- Click Add users.
- Select the user serving as the IBM StreamSets system administrator, and choose the Admin role.
- Select additional users and user groups, and choose the User role.
- Click Add.
- Inform the system administrator that they can access IBM StreamSets on the IBM Software Hub platform and complete the next post-installation task.
Creating an Organization
The IBM StreamSets system administrator must create an organization before users can log in to IBM StreamSets.
An organization is a secure space provided to a set of IBM StreamSets users. All Data Collector engines, pipelines, jobs, topologies, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets as a member of an organization and can access data that belongs to that organization only.
As the system administrator, you can create a single organization for all users. Or you can create multiple organizations for different groups of users.
- Log in to IBM Software Hub.
-
Open the IBM StreamSets service from the page.
The IBM StreamSets system administrator tool opens in a new browser tab.
- Click .
- Enter an organization name.
- Select main for the instance.
-
Enter the email address of the primary organization administrator.
The primary organization administrator must be an existing IBM Software Hub user account that has this email address and that has access to the IBM StreamSets service with the User role.
- Accept the defaults for the remaining properties.
- Click Create Organization.
- Inform the organization administrator that they can access IBM StreamSets on the IBM Software Hub platform and complete the remaining post-installation tasks.
Inviting Users to the Organization
The organization administrator must invite users to the organization.
- Log in to IBM Software Hub.
-
Open the IBM StreamSets service from the page.
IBM StreamSets Control Hub opens in a new browser tab.
- In the toolbar, click the My Account icon () and then click Invite Users.
-
In the Add Users window, type or paste one or more email
addresses.
The email addresses must be associated with existing IBM Software Hub user accounts that have access to the IBM StreamSets service with the User role.
-
Optionally, select one or more existing groups to add the users to.
Control Hub provides a default all group that includes every user in the organization. To create additional groups, see Creating Groups.
-
Optionally, modify the roles assigned to the users.
Default role assignments for new users permit most tasks to encourage development and testing. Change the role assignments as needed to secure the integrity of your organization and data.
For a description of each role, see Role Descriptions.
- Click Invite.
-
Inform the users that they can access IBM StreamSets on the IBM Software Hub platform.
Note: Users must join the organization within seven days. Otherwise, the invite expires and you must renew the invites.
Deploying a Data Collector Engine
Before users can begin building pipelines, the organization administrator must use IBM StreamSets Control Hub to deploy a Data Collector engine and then grant users access to the engine.
Data Collector is an engine that processes data. As an organization administrator, you deploy Data Collector engines to the location where data resides, which can be on-premises or on a protected cloud computing platform.
To get started with IBM StreamSets, create a self-managed deployment to deploy a Data Collector engine. When you create the deployment, share the deployment with all users invited to your organization, granting them full access to the deployment. When users build a pipeline, they select this deployed engine. For more information, see Self-Managed Deployments.
After getting started, you might consider using Kubernetes environments and deployments. With the Kubernetes integration, Control Hub automatically provisions the resources needed to run a Data Collector engine in your Kubernetes cluster, and then deploys engine instances to those resources.
Administering Organizations
As the system administrator for IBM StreamSets as client-managed software, you can complete full administrative tasks across all organizations.
An organization is a secure space provided to a set of users. All environments, deployments, pipelines, jobs, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets Control Hub as a member of an organization and can access data that belongs to that organization only.
When you create an organization, you create an organization administrator that can perform administrative tasks for that organization.
You can create a single organization for all users. Or you can create multiple organizations for different sets of users. For example, you might create one organization for the Northern Office and another organization for the Southern Office. Users in the Northern Office organization cannot access any data that belongs to the Southern Office organization. For more information, see Comparing Organizations and Groups.
Comparing Organizations and Groups
You can use both organizations and groups to create sets of users. However, there are important differences between the two:
- Organizations
- Only the system administrator can create organizations.
- Groups
- An organization administrator can create groups within the organization.
- To create a multitenant environment with organizations, the system administrator creates multiple organizations and then organization administrators add the appropriate users to each organization.
- To create a multitenant environment with multiple groups in a single organization, an organization administrator creates groups of users, and then shares objects within the groups to grant each group access to the appropriate objects.
For more information about using groups and permissions to create a multitenant environment, see Users and Groups.
Changing a Primary Organization Administrator
An organization can include multiple organization administrators, but only one primary organization administrator.
The IBM StreamSets system administrator configures the primary organization administrator when creating the organization.
The current primary organization administrator can change the primary administrator for the organization. However, as the system administrator, you can also change the primary administrator for any organization.
- Log in to IBM Software Hub.
-
Open the IBM StreamSets service from the page.
The IBM StreamSets system administrator tool opens in a new browser tab.
- Click .
- Click the internal ID of the organization that you want to edit.
- In the list of users, find another user assigned the administrator role and then click Make Primary Admin.
Activating or Deactivating an Organization
An organization must be active so that users can log in as members of that organization.
As the system administrator, you might temporarily deactivate an organization to disable access to IBM StreamSets.
- Log in to IBM Software Hub.
-
Open the IBM StreamSets service from the page.
The IBM StreamSets system administrator tool opens in a new browser tab.
- Click .
- Click the internal ID of the organization that you want to edit.
-
Click one of the following buttons:
- Activate to activate the organization.
- Deactivate to deactivate the organization.
Deleting an Organization
As the system administrator, you can delete an organization. Deleting an organization permanently removes the organization, including all objects created for the organization.
- Log in to IBM Software Hub.
-
Open the IBM StreamSets service from the page.
The IBM StreamSets system administrator tool opens in a new browser tab.
- Click .
- Click the internal ID of the organization that you want to delete.
- Click Delete.
Configuring Global Organization Properties
As the system administrator, you can configure organization properties at a global level to affect all organizations or at an organization level to affect a specific organization.
Some properties can be overridden by the organization administrator for each organization.
- Log in to IBM Software Hub.
-
Open the IBM StreamSets service from the page.
The IBM StreamSets system administrator tool opens in a new browser tab.
- Click .
-
In the Actions column for any listed organization, click Visit
Instance.
IBM StreamSets Control Hub opens and displays all existing organizations.
-
Configure organization properties in one of the following ways:
- Global level - Above the list of organizations, click the
More icon and then click Global
Configuration.
Global configurations are applied to all organizations, unless an organization administrator has already configured the property at the organization level.
For example, an organization administrator sets the Default authoring engine timeout property to 8,000 milliseconds for the Northern Office organization. The system administrator then sets the same property at a global level to 6,000 milliseconds. All organizations use the modified value of 6,000 milliseconds, except for the Northern Office organization which retains the value of 8,000 configured at the organization level.
- Organization level - Next to a specific organization, click the
More icon and then click
Configuration.
Organization level configurations are applied to the selected organization only.
- Global level - Above the list of organizations, click the
More icon and then click Global
Configuration.
Increasing Default System Limits
Control Hub sets default system limits on the number of objects that can exist in each organization. The limits protect the system from run-away scripts or unintended automation usage.
These limits are sufficient for most organizations. However, as the system administrator, you can increase the limits globally for all organizations or for a specific organization.
For more information about the default values, see Organization Default System Limits.
- Log in to IBM Software Hub.
-
Open the IBM StreamSets service from the page.
The IBM StreamSets system administrator tool opens in a new browser tab.
- Click .
-
In the Actions column for any listed organization, click Visit
Instance.
IBM StreamSets Control Hub opens and displays all existing organizations.
-
Access the organization configuration properties in one of the following
ways:
- Global level - Above the list of organizations, click the More icon and then click Global Configuration.
- Organization level - Next to a specific organization, click the More icon and then click Configuration.
-
Increase the system limit values.
The organization configuration properties are grouped into several tabs. The following table lists the system limits that you can configure on each tab:
Organization Configuration Tab System Limits Jobrunner
- Jobs
- Active jobs running concurrently
Provisioning
- Deployments
- Environments
- Legacy Kubernetes deployments
Security
- Engines
- Groups
- Legacy Kubernetes Provisioning Agents
Misc
- API credentials per user
- Pipelines (including both draft and published pipelines)
- Pipeline versions or commits
- Scheduled tasks
- Subscriptions
- Topologies
- Topology versions or commits
Uninstalling
An IBM Software Hub instance administrator and a Red Hat OpenShift Container Platform cluster administrator can work together to uninstall IBM StreamSets from an instance of IBM Software Hub.
Deleting the Service Instance
An instance administrator can delete the service instance associated with IBM StreamSets.
Delete the service instance to ensure that the instance releases the resources that it reserved.
- Log in to IBM Software Hub.
- From the navigation menu, select .
- Locate the streamsets instance.
- From the action menu, select Delete.
Uninstalling the Service
A Red Hat OpenShift Container Platform cluster administrator can uninstall the IBM StreamSets service.
${OC_LOGIN}
is an alias for theoc login
command${PROJECT_CPD_INST_OPERATORS}
refers to the operators project${PROJECT_CPD_INST_OPERANDS}
refers to the operands project
If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.
source ./cpd_vars.sh
-
Log in to Red Hat OpenShift Container Platform as a cluster
administrator:
${OC_LOGIN}
-
Delete the custom resource for IBM StreamSets. Set the
production
argument totrue
if you purchased the production license, or tofalse
if you purchased the non-production license:oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsets --namespace ${PROJECT_CPD_INST_OPERANDS} --action deleteCustomResources --args "--production <false|true>"
-
Delete the IBM StreamSets operator, setting the
production
argument based on your license:oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsetsOperator --namespace ${PROJECT_CPD_INST_OPERATORS} --action uninstall --args "--production <false|true>"