Type Converter

The Type Converter processor converts the data types of specified fields to compatible types. For example, you might use the Type Converter to convert a String field containing single-precision floating point numbers to a Float field.

Use the Type Converter processor to convert data to simple types, such as String, Boolean, or Timestamp. The Type Converter does not convert data to complex types such as lists or maps.

When you configure a Type Converter, you specify the field to convert and the data type to convert to. When converting to the Decimal data type, you also configure the precision and scale for the field.

Important: Due to underlying Spark behavior, when attempting an invalid conversion, the Type Converter processor might generate errors or replace field values with null values. Be sure to configure the processor to convert data to valid types.

Field Type Conversion

The Type Converter processor uses Spark to perform field type conversions. Spark determines if a type conversion is valid based on the data in the field. As a result, a type conversion might be valid for some records and invalid for others.

For example, converting 8.2 from a String to Decimal(2,1) is valid, but converting 55.32 from String to Decimal(2,1) is invalid. Converting 55.32 to Decimal requires a minimum precision of 4 and scale of 2.

When attempting an invalid conversion, the Type Converter processor might generate errors or replace field values with null values. Be sure to configure the processor to convert data to valid types.
Tip: When building your pipeline, you can preview data to determine how type conversions are performed.

Use the following guidelines for type conversion:

  • Field values can be converted to compatible data types, such as a date value from String to Date, or an integer value from Long to Decimal.
  • When converting from a decimal value to Integer, the scale is truncated.
  • When converting from a decimal value to Decimal with a lower scale, the scale is rounded.
  • When converting from an integer value to Decimal, zeros can be used as placeholders for the scale.
  • When converting from a numeric value to Boolean, 0 converts to false and all other numbers convert to true.
  • To convert to datetime types, the data must be in the correct format:
    • Conversion to Date requires the following input format: yyyy-MM-dd.
    • Conversion to Timestamp requires the following input format: yyyy-MM-dd hh:mm:ss.

    The converted data uses the input format as the output format.

    Tip: To convert to a custom format, you can use the Spark SQL Expression processor with a function such as to_date or to_timestamp.
  • Fields within lists cannot be converted.
  • Fields within maps are not converted as expected.
  • When field values are replaced by nulls, Spark considers the conversion invalid.

Configuring a Type Converter Processor

Configure a Type Converter processor to change the data type of a field to a compatible type.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Cache Data Caches data processed for a batch so the data can be reused for multiple downstream stages. Use to improve performance when the stage passes data to multiple stages.

    Caching can limit pushdown optimization when the pipeline runs in ludicrous mode.

  2. On the Conversions tab, configure the following properties:
    Conversion Property Description
    Field Name Name of the field to convert.
    Target Type Data type to convert to.
    Precision Precision for a Decimal field.

    For the Decimal data type only.

    Scale Scale for a Decimal field.

    For the Decimal data type only.

  3. To configure another field type conversion, click the Add icon.
    You can use simple or bulk edit mode to configure the conversions.