Type Converter
The Type Converter processor converts the data types of specified fields to compatible types. For example, you might use the Type Converter to convert a String field containing single-precision floating point numbers to a Float field.
Use the Type Converter processor to convert data to simple types, such as String, Boolean, or Timestamp. The Type Converter does not convert data to complex types such as lists or maps.
When you configure a Type Converter, you specify the field to convert and the data type to convert to. When converting to the Decimal data type, you also configure the precision and scale for the field.
Field Type Conversion
The Type Converter processor uses Spark to perform field type conversions. Spark determines if a type conversion is valid based on the data in the field. As a result, a type conversion might be valid for some records and invalid for others.
For example, converting 8.2
from a String to Decimal(2,1) is valid, but
converting 55.32
from String to Decimal(2,1) is invalid. Converting
55.32
to Decimal requires a minimum precision of 4 and scale of 2.
Use the following guidelines for type conversion:
- Field values can be converted to compatible data types, such as a date value from String to Date, or an integer value from Long to Decimal.
- When converting from a decimal value to Integer, the scale is truncated.
- When converting from a decimal value to Decimal with a lower scale, the scale is rounded.
- When converting from an integer value to Decimal, zeros can be used as placeholders for the scale.
- When converting from a numeric value to Boolean, 0 converts to
false
and all other numbers convert totrue
. - To convert to datetime types, the data must be in the correct format:
- Conversion to Date requires the following input format:
yyyy-MM-dd
. - Conversion to Timestamp requires the following input format:
yyyy-MM-dd hh:mm:ss
.
The converted data uses the input format as the output format.
Tip: To convert to a custom format, you can use the Spark SQL Expression processor with a function such asto_date
orto_timestamp
. - Conversion to Date requires the following input format:
- Fields within lists cannot be converted.
- Fields within maps are not converted as expected.
- When field values are replaced by nulls, Spark considers the conversion invalid.
Configuring a Type Converter Processor
-
In the Properties panel, on the
General tab, configure the following properties:
General Property Description Name Stage name. Description Optional description. Cache Data Caches data processed for a batch so the data can be reused for multiple downstream stages. Use to improve performance when the stage passes data to multiple stages. Caching can limit pushdown optimization when the pipeline runs in ludicrous mode.
-
On the Conversions tab, configure the following
properties:
Conversion Property Description Field Name Name of the field to convert. Target Type Data type to convert to. Precision Precision for a Decimal field. For the Decimal data type only.
Scale Scale for a Decimal field. For the Decimal data type only.
-
To configure another field type conversion, click the
Add icon.
You can use simple or bulk edit mode to configure the conversions.