IBM StreamSets as Client-Managed Software

IBM StreamSets as client-managed software is managed by customers.

You must install IBM Software Hub on a supported cloud deployment environment and then install IBM StreamSets on IBM Software Hub.

After installation, you can administer IBM StreamSets on an ongoing basis.

Installing

Complete the following high-level tasks to install IBM StreamSets on IBM Software Hub:

Prerequisites

Before you install IBM StreamSets, verify that the following prerequisites have been completed:
  • IBM Software Hub version 5.1.0 is installed.

    For more information about installing IBM Software Hub, see the IBM Software Hub documentation.

  • The Red Hat® OpenShift® Container Platform cluster meets the minimum requirements for installing IBM StreamSets.

    For more information about system requirements, see the IBM Software Hub documentation.

  • The Red Hat OpenShift Container Platform cluster includes a default storage class.

    For more information about designating a default storage class, see the Red Had OpenShift documentation.

  • The workstation from which you run the installation is set up as a client workstation and includes the following command-line interfaces:
    • OpenShift CLI, oc.
    • Kubernetes command-line tool, kubectl. The tool must be configured to access your cluster.
    • Istio command-line tool, istioctl. For more information about installing the tool, see the Istioctl documentation.
  • You have an environment variables script to use with installation commands.
    The IBM StreamSets installation commands use the following environment variables so that you can run the commands exactly as written:
    • ${OC_LOGIN} is an alias for the oc login command
    • ${PROJECT_CPD_INST_OPERATORS} refers to the operators project
    • ${PROJECT_CPD_INST_OPERANDS} refers to the operands project

    If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.

    To use the environment variables from the script, you must source the environment variables before you run the installation commands. For example, run:
    source ./cpd_vars.sh

Installing IBM StreamSets on IBM Software Hub

After completing the prerequisite tasks, a Red Hat® OpenShift® Container Platform cluster administrator uses the command line to install IBM StreamSets on IBM Software Hub.

  1. Download and install the ibm-pak plug-in.

    Download the plug-in from the IBM Catalog Management Plug-in for IBM Cloud Paks repository on GitHub. For installation instructions, see the Readme file.

  2. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
  3. Accept the IBM StreamSets license:
    export LICENSE_ACCEPTANCE=true
  4. Install the IBM StreamSets operator. Set the production argument to true if you purchased the production license, or to false if you purchased the non-production license:
    oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsetsOperator --namespace ${PROJECT_CPD_INST_OPERATORS} --action install --args "--production <false|true>"
  5. Verify that the IBM StreamSets operator is running:
    kubectl get pods -n ${PROJECT_CPD_INST_OPERATORS} | grep streamsets
  6. Set the namespace where you want to install the IBM StreamSets operands:
    kubectl config set-context --current --namespace=${PROJECT_CPD_INST_OPERANDS}
  7. Install the IBM StreamSets operands, setting the production argument based on your license:
    oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsets --namespace ${PROJECT_CPD_INST_OPERANDS} --action applyCustomResources --args "--production <false|true>"
  8. Verify that the IBM StreamSets components are running:
    kubectl get pods -l icpdsupport/addOnId=streamsets

Post-installation Tasks

After a cluster administrator installs IBM StreamSets, an IBM Software Hub instance administrator, an IBM StreamSets system administrator, and an IBM StreamSets organization administrator work together to complete the post-installation tasks.

Administrators must complete the post-installation tasks before users can begin building data pipelines.

First, an IBM Software Hub instance administrator completes the following tasks:
Then, an IBM StreamSets system administrator completes the following task:
Finally, an IBM StreamSets organization administrator completes the following tasks:

Verifying User Email Addresses

IBM StreamSets requires that each user account have an email address.

An IBM Software Hub instance administrator must verify that each user account that requires access to IBM StreamSets has an email address.

  1. Log in to IBM Software Hub.
  2. From the navigation menu, select Administration > Access control.
  3. For each user account that requires access to IBM StreamSets:
    1. Click the user name.
    2. Click Edit.
    3. Verify that the user account has an associated email address, and add an address if needed.
    4. Click Save.

Granting Users Access to the Service

An IBM Software Hub instance administrator must grant users access to the IBM StreamSets service.

Grant access to the following types of IBM StreamSets users:

System administrator
The system administrator manages all organizations across IBM StreamSets. Grant this user account the Admin role.
Organization administrators and users
Organization administrators and users work within a single IBM StreamSets organization. Grant all organization administrators and users the User role.
  1. Log in to IBM Software Hub.
  2. From the navigation menu, select Services > Instances.
  3. Locate the streamsets instance.
  4. From the action menu, select Manage access.
  5. Add users and user groups.
    1. Click Add users.
    2. Select the user serving as the IBM StreamSets system administrator, and choose the Admin role.
    3. Select additional users and user groups, and choose the User role.
    4. Click Add.
  6. Inform the system administrator that they can access IBM StreamSets on the IBM Software Hub platform and complete the next post-installation task.

Creating an Organization

The IBM StreamSets system administrator must create an organization before users can log in to IBM StreamSets.

An organization is a secure space provided to a set of IBM StreamSets users. All Data Collector engines, pipelines, jobs, topologies, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets as a member of an organization and can access data that belongs to that organization only.

As the system administrator, you can create a single organization for all users. Or you can create multiple organizations for different groups of users.

  1. Log in to IBM Software Hub.
  2. Open the IBM StreamSets service from the Services > Instances page.

    The IBM StreamSets system administrator tool opens in a new browser tab.

  3. Click System Administration > Add New Organization.
  4. Enter an organization name.
  5. Select main for the instance.
  6. Enter the email address of the primary organization administrator.

    The primary organization administrator must be an existing IBM Software Hub user account that has this email address and that has access to the IBM StreamSets service with the User role.

  7. Accept the defaults for the remaining properties.
  8. Click Create Organization.
  9. Inform the organization administrator that they can access IBM StreamSets on the IBM Software Hub platform and complete the remaining post-installation tasks.

Inviting Users to the Organization

The organization administrator must invite users to the organization.

Important: The IBM StreamSets system administrator completes tasks across all organizations, and should not be invited to join an organization.
  1. Log in to IBM Software Hub.
  2. Open the IBM StreamSets service from the Services > Instances page.

    IBM StreamSets Control Hub opens in a new browser tab.

  3. In the toolbar, click the My Account icon () and then click Invite Users.
  4. In the Add Users window, type or paste one or more email addresses.

    The email addresses must be associated with existing IBM Software Hub user accounts that have access to the IBM StreamSets service with the User role.

  5. Optionally, select one or more existing groups to add the users to.

    Control Hub provides a default all group that includes every user in the organization. To create additional groups, see Creating Groups.

  6. Optionally, modify the roles assigned to the users.

    Default role assignments for new users permit most tasks to encourage development and testing. Change the role assignments as needed to secure the integrity of your organization and data.

    For a description of each role, see Role Descriptions.

  7. Click Invite.
  8. Inform the users that they can access IBM StreamSets on the IBM Software Hub platform.
    Note: Users must join the organization within seven days. Otherwise, the invite expires and you must renew the invites.

Deploying a Data Collector Engine

Before users can begin building pipelines, the organization administrator must use IBM StreamSets Control Hub to deploy a Data Collector engine and then grant users access to the engine.

Data Collector is an engine that processes data. As an organization administrator, you deploy Data Collector engines to the location where data resides, which can be on-premises or on a protected cloud computing platform.

To get started with IBM StreamSets, create a self-managed deployment to deploy a Data Collector engine. When you create the deployment, share the deployment with all users invited to your organization, granting them full access to the deployment. When users build a pipeline, they select this deployed engine. For more information, see Self-Managed Deployments.

After getting started, you might consider using Kubernetes environments and deployments. With the Kubernetes integration, Control Hub automatically provisions the resources needed to run a Data Collector engine in your Kubernetes cluster, and then deploys engine instances to those resources.

Administering Organizations

As the system administrator for IBM StreamSets as client-managed software, you can complete full administrative tasks across all organizations.

An organization is a secure space provided to a set of users. All environments, deployments, pipelines, jobs, and other objects added by any user in the organization belong to that organization. A user logs in to IBM StreamSets Control Hub as a member of an organization and can access data that belongs to that organization only.

When you create an organization, you create an organization administrator that can perform administrative tasks for that organization.

You can create a single organization for all users. Or you can create multiple organizations for different sets of users. For example, you might create one organization for the Northern Office and another organization for the Southern Office. Users in the Northern Office organization cannot access any data that belongs to the Southern Office organization. For more information, see Comparing Organizations and Groups.

Important: Control Hub includes a default system organization with the name Administrator that includes the system administrator user account. Do not use the system organization to build pipelines. Instead, create organizations for your enterprise separate from the system organization.

Comparing Organizations and Groups

You can use both organizations and groups to create sets of users. However, there are important differences between the two:

Organizations
Only the system administrator can create organizations.
Organizations are required. Each user logs in as a member of an organization.
Organizations are completely independent from each other. Data cannot be shared between organizations. After logging in to Control Hub, users can see the data only for the organization that they logged into. Users cannot view data across organizations in a single login session. If a user needs to access data belonging to two different organizations, the user must have an account in each organization.
Users can share objects with other users that belong to the same organization - but they cannot share objects across organizations.
Groups
An organization administrator can create groups within the organization.
Groups are optional groupings of users within a single organization. Use groups to more efficiently assign roles and permissions to sets of users without having to edit individual users. When you add a user, you can optionally specify the groups that the user belongs to.
Groups can be independent from each other, based on how you assign permissions to the groups and users within the groups. Data can also be shared between groups. After logging in to Control Hub, users who belong to multiple groups can see all data that all of the groups have been granted access to. Users can view data across groups in a single login session.
Users can share objects with users in different groups within a single organization.
You can use both organizations and groups to create a multitenant environment:
  • To create a multitenant environment with organizations, the system administrator creates multiple organizations and then organization administrators add the appropriate users to each organization.
  • To create a multitenant environment with multiple groups in a single organization, an organization administrator creates groups of users, and then shares objects within the groups to grant each group access to the appropriate objects.

For more information about using groups and permissions to create a multitenant environment, see Users and Groups.

Changing a Primary Organization Administrator

An organization can include multiple organization administrators, but only one primary organization administrator.

The IBM StreamSets system administrator configures the primary organization administrator when creating the organization.

The current primary organization administrator can change the primary administrator for the organization. However, as the system administrator, you can also change the primary administrator for any organization.

  1. Log in to IBM Software Hub.
  2. Open the IBM StreamSets service from the Services > Instances page.

    The IBM StreamSets system administrator tool opens in a new browser tab.

  3. Click System Administration > Organizations.
  4. Click the internal ID of the organization that you want to edit.
  5. In the list of users, find another user assigned the administrator role and then click Make Primary Admin.

Activating or Deactivating an Organization

An organization must be active so that users can log in as members of that organization.

As the system administrator, you might temporarily deactivate an organization to disable access to IBM StreamSets.

  1. Log in to IBM Software Hub.
  2. Open the IBM StreamSets service from the Services > Instances page.

    The IBM StreamSets system administrator tool opens in a new browser tab.

  3. Click System Administration > Organizations.
  4. Click the internal ID of the organization that you want to edit.
  5. Click one of the following buttons:
    • Activate to activate the organization.
    • Deactivate to deactivate the organization.

Deleting an Organization

As the system administrator, you can delete an organization. Deleting an organization permanently removes the organization, including all objects created for the organization.

  1. Log in to IBM Software Hub.
  2. Open the IBM StreamSets service from the Services > Instances page.

    The IBM StreamSets system administrator tool opens in a new browser tab.

  3. Click System Administration > Organizations.
  4. Click the internal ID of the organization that you want to delete.
  5. Click Delete.

Configuring Global Organization Properties

As the system administrator, you can configure organization properties at a global level to affect all organizations or at an organization level to affect a specific organization.

Some properties can be overridden by the organization administrator for each organization.

  1. Log in to IBM Software Hub.
  2. Open the IBM StreamSets service from the Services > Instances page.

    The IBM StreamSets system administrator tool opens in a new browser tab.

  3. Click System Administration > Organizations.
  4. In the Actions column for any listed organization, click Visit Instance.

    IBM StreamSets Control Hub opens and displays all existing organizations.

  5. Configure organization properties in one of the following ways:
    • Global level - Above the list of organizations, click the More icon and then click Global Configuration.

      Global configurations are applied to all organizations, unless an organization administrator has already configured the property at the organization level.

      For example, an organization administrator sets the Default authoring engine timeout property to 8,000 milliseconds for the Northern Office organization. The system administrator then sets the same property at a global level to 6,000 milliseconds. All organizations use the modified value of 6,000 milliseconds, except for the Northern Office organization which retains the value of 8,000 configured at the organization level.

    • Organization level - Next to a specific organization, click the More icon and then click Configuration.

      Organization level configurations are applied to the selected organization only.

Increasing Default System Limits

Control Hub sets default system limits on the number of objects that can exist in each organization. The limits protect the system from run-away scripts or unintended automation usage.

These limits are sufficient for most organizations. However, as the system administrator, you can increase the limits globally for all organizations or for a specific organization.

For more information about the default values, see Organization Default System Limits.

  1. Log in to IBM Software Hub.
  2. Open the IBM StreamSets service from the Services > Instances page.

    The IBM StreamSets system administrator tool opens in a new browser tab.

  3. Click System Administration > Organizations.
  4. In the Actions column for any listed organization, click Visit Instance.

    IBM StreamSets Control Hub opens and displays all existing organizations.

  5. Access the organization configuration properties in one of the following ways:
    • Global level - Above the list of organizations, click the More icon and then click Global Configuration.
    • Organization level - Next to a specific organization, click the More icon and then click Configuration.
  6. Increase the system limit values.
    The organization configuration properties are grouped into several tabs. The following table lists the system limits that you can configure on each tab:
    Organization Configuration Tab System Limits

    Jobrunner

    • Jobs
    • Active jobs running concurrently

    Provisioning

    • Deployments
    • Environments
    • Legacy Kubernetes deployments

    Security

    • Engines
    • Groups
    • Legacy Kubernetes Provisioning Agents

    Misc

    • API credentials per user
    • Pipelines (including both draft and published pipelines)
    • Pipeline versions or commits
    • Scheduled tasks
    • Subscriptions
    • Topologies
    • Topology versions or commits

Uninstalling

An IBM Software Hub instance administrator and a Red Hat OpenShift Container Platform cluster administrator can work together to uninstall IBM StreamSets from an instance of IBM Software Hub.

Complete the following tasks to uninstall IBM StreamSets:

Deleting the Service Instance

An instance administrator can delete the service instance associated with IBM StreamSets.

Delete the service instance to ensure that the instance releases the resources that it reserved.

  1. Log in to IBM Software Hub.
  2. From the navigation menu, select Services > Instances.
  3. Locate the streamsets instance.
  4. From the action menu, select Delete.

Uninstalling the Service

A Red Hat OpenShift Container Platform cluster administrator can uninstall the IBM StreamSets service.

The IBM StreamSets uninstallation commands use the following environment variables so that you can run the commands exactly as written:
  • ${OC_LOGIN} is an alias for the oc login command
  • ${PROJECT_CPD_INST_OPERATORS} refers to the operators project
  • ${PROJECT_CPD_INST_OPERANDS} refers to the operands project

If you don't have a script that defines the environment variables, see the IBM Software Hub documentation.

To use the environment variables from the script, you must source the environment variables before you run the uninstallation commands. For example, run:
source ./cpd_vars.sh
  1. Log in to Red Hat OpenShift Container Platform as a cluster administrator:
    ${OC_LOGIN}
  2. Delete the custom resource for IBM StreamSets. Set the production argument to true if you purchased the production license, or to false if you purchased the non-production license:
    oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsets --namespace ${PROJECT_CPD_INST_OPERANDS} --action deleteCustomResources --args "--production <false|true>"
  3. Delete the IBM StreamSets operator, setting the production argument based on your license:
    oc ibm-pak launch ibm-streamsets --version 1.0.0+20241205.192237.15 --inventory streamsetsOperator --namespace ${PROJECT_CPD_INST_OPERATORS} --action uninstall --args "--production <false|true>"