Cassandra

The Cassandra destination writes data to a Cassandra cluster. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.

When you configure the Cassandra destination, you define connection information and map incoming fields to columns in the Cassandra table. You can also use a connection to configure the destination. You specify whether the destination writes each batch to Cassandra as a logged batch or an unlogged batch. You can disable batch writes and have the destination write records individually instead.

You configure whether the destination uses no authentication or username and password authentication to access the Cassandra cluster. If you install the DataStax Enterprise (DSE) Java driver, you can configure the destination to use DSE username and password authentication or Kerberos authentication. You can also enable the destination to use SSL/TLS to connect to the cluster.

Batch Type

The Cassandra destination can write batches to a Cassandra cluster using one of the following batch types:

Logged
Logged batches written to Cassandra use the Cassandra distributed batch log and are atomic. This means that the destination can only write entire batches of records to Cassandra. If an error occurs with one or more records in a batch, the destination fails the entire batch. When a batch fails, all records are sent to the stage for error handling.
Unlogged
Unlogged batches written to Cassandra do not use the Cassandra distributed batch log and are nonatomic. This means that the destination can write partial batches of records to Cassandra. If an error occurs with one or more records in a batch, the destination sends only those failed records to the stage for error handling. The destination writes the remaining records in the batch to Cassandra.

By default, the destination uses the logged batch type.

For more information about the Cassandra distributed batch log, see the Cassandra Query Language (CQL) documentation.

Authentication

Configure the Cassandra destination to use one of the following authentication providers to access the Cassandra cluster:
  • None - Performs no authentication.
  • Username/Password - Uses Cassandra username and password authentication.
  • Username/Password (DSE) - Uses DataStax Enterprise username and password authentication. Requires that you install the DSE Java driver.
  • Kerberos (DSE) - Uses Kerberos authentication. Requires that you install the DSE Java driver.

Before selecting one of the DSE authentication providers, install the DSE Java driver version 1.2.4 or later. For a compatibility matrix, see the Cassandra documentation. You install the driver into the Cassandra stage library, streamsets-datacollector-cassandra_3-lib, which includes the destination. For information about installing additional drivers, see Install External Libraries in the Data Collector documentation.

Kerberos (DSE) Authentication

If you install the DSE Java driver, you can use Kerberos authentication to connect to a Cassandra cluster. When you use Kerberos authentication, Data Collector uses the Kerberos principal and keytab to connect to the cluster. By default, Data Collector uses the user account who started it to connect.

The Kerberos principal and keytab are defined in the Data Collector configuration file, $SDC_CONF/sdc.properties. To use Kerberos authentication, configure all Kerberos properties in the Data Collector configuration file, install the DSE Java driver, and then enable Kerberos (DSE) authentication in the Cassandra destination.

Cassandra Data Types

Due to Cassandra requirements, the data types of the incoming fields must match the data types of the corresponding Cassandra columns. When appropriate, use a Field Type Converter processor earlier in the pipeline to convert data types.

For details about the conversion of Java data types to Cassandra data types, see the Cassandra documentation.

The Cassandra destination supports the following Cassandra data types:
  • ASCII
  • Bigint
  • Boolean
  • Counter
  • Decimal
  • Double
  • Float
  • Int
  • List
  • Map
  • Text
  • Timestamp
  • Timeuuid
  • Uuid
  • Varchar
  • Varint
The following data types are not supported at this time:
  • Blob
  • Inet
  • Set

Configuring a Cassandra Destination

Configure a Cassandra destination to write data to a Cassandra cluster.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Cassandra tab, configure the following properties:
    Cassandra Property Description
    Connection Connection that defines the information required to connect to an external system.

    To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.

    Cassandra Contact Points Host names for nodes in Cassandra cluster. Using simple or bulk edit mode, click the Add icon to enter several host names to ensure a connection.
    Cassandra Port The port number for the Cassandra nodes.
    Authentication Provider Determines the authentication provider used to access the cluster:
    • None - Performs no authentication.
    • Username/Password - Uses Cassandra username and password authentication.
    • Username/Password (DSE) - Uses DataStax Enterprise username and password authentication. Requires that you install the DSE Java driver.
    • Kerberos (DSE) - Uses Kerberos authentication. Requires that you install the DSE Java driver.

    Before selecting one of the DSE authentication providers, install the DSE Java driver version 1.2.4 or later. For a compatibility matrix, see the Cassandra documentation. For information about installing the driver, see Install External Libraries in the Data Collector documentation.

    Protocol Version Native protocol version that defines the format of the binary messages exchanged between the driver and Cassandra. Select the protocol version that you are using.

    For information about determining your protocol version, see the Cassandra documentation.

    Fully-Qualified Table Name Name of the Cassandra table to use. Enter a fully-qualified name using the following format: <key space>.<table name>.
    Field to Column Mapping Map fields from the record to Cassandra columns. Using simple or bulk edit mode, click the Add icon to create additional field mappings.
    Note: The record field data type must match the data type of the Cassandra column.
    Compression Optional compression type for transport-level requests and responses.
    Enable Batches Enables Cassandra batch operations. When not selected, the origin uses individual statements to write records to Cassandra.
    Batch Type Type of batch to write to Cassandra:
    • Logged
    • Unlogged

    Available when batch operations are enabled.

    Write Timeout Maximum time in milliseconds allowed to complete a write request.

    Available when batch operations are not enabled.

    Max Batch Size Maximum number of statements to include in each batch written to Cassandra. Ensure that this number does not exceed the batch size configured in the Cassandra cluster.

    Available when batch operations are enabled.

    Connection Timeout

    Maximum time in milliseconds to wait for a connection.

    Default is 5000.

    Socket Read Timeout Maximum time in milliseconds for the Cassandra cluster to read the data that the destination writes.
    Consistency Level

    Write consistency level to use when Cassandra is set up on a cluster.

    For more information about write consistency levels, see the Cassandra documentation.

    Log Slow Queries Enables the logging of slow queries.
    Note: To use this property, you must have the logger for com.datastax.driver.core.QueryLogger.SLOW set to DEBUG. To do this, you add the following line to the Data Collector log configuration file in Control Hub:
    com.datastax.driver.core.QueryLogger.SLOW=DEBUG
    For more information about making changes to log configuration, see the Control Hub documentation.
    Slow Query Logging Threshold Minimum time in milliseconds to consider a query slow.

    Default is 5000.

    Available when Log Slow Queries is enabled.

  3. When using username/password authentication, on the Credentials tab, specify a user name and password.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.
  4. To use SSL/TLS, on the TLS tab, configure the following properties:
    TLS Property Description
    Use TLS Enables the use of TLS.
    Use Remote Keystore Enables loading the contents of the keystore from a remote credential store or from values entered in the stage properties. For more information, see Remote Keystore and Truststore.
    Private Key Private key used in the remote keystore. Enter a credential function that returns the key or enter the contents of the key.
    Certificate Chain Each PEM certificate used in the remote keystore. Enter a credential function that returns the certificate or enter the contents of the certificate.

    Using simple or bulk edit mode, click the Add icon to add additional certificates.

    Keystore File

    Path to the local keystore file. Enter an absolute path to the file or enter the following expression to define the file stored in the Data Collector resources directory:

    ${runtime:resourcesDirPath()}/keystore.jks

    By default, no keystore is used.

    Keystore Type Type of keystore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Keystore Password

    Password to the keystore file. A password is optional, but recommended.

    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.
    Keystore Key Algorithm

    Algorithm to manage the keystore.

    Default is SunX509.

    Use Remote Truststore Enables loading the contents of the truststore from a remote credential store or from values entered in the stage properties. For more information, see Remote Keystore and Truststore.
    Trusted Certificates Each PEM certificate used in the remote truststore. Enter a credential function that returns the certificate or enter the contents of the certificate.

    Using simple or bulk edit mode, click the Add icon to add additional certificates.

    Truststore File

    Path to the local truststore file. Enter an absolute path to the file or enter the following expression to define the file stored in the Data Collector resources directory:

    ${runtime:resourcesDirPath()}/truststore.jks

    By default, no truststore is used.

    Truststore Type
    Type of truststore to use. Use one of the following types:
    • Java Keystore File (JKS)
    • PKCS #12 (p12 file)

    Default is Java Keystore File (JKS).

    Truststore Password

    Password to the truststore file. A password is optional, but recommended.

    Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.
    Truststore Trust Algorithm

    Algorithm to manage the truststore.

    Default is SunX509.

    Use Default Protocols Uses the default TLSv1.2 transport layer security (TLS) protocol. To use a different protocol, clear this option.
    Transport Protocols TLS protocols to use. To use a protocol other than the default TLSv1.2, click the Add icon and enter the protocol name. You can use simple or bulk edit mode to add protocols.
    Note: Older protocols are not as secure as TLSv1.2.
    Use Default Cipher Suites Uses a default cipher suite for the SSL/TLS handshake. To use a different cipher suite, clear this option.
    Cipher Suites Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite. You can use simple or bulk edit mode to add cipher suites.

    Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.