LLM Translate

The LLM Translate processor translates text from specified columns to the chosen language. The processor uses the Snowflake Large Language Model (LLM) Translate function to translate the data.

When you configure the processor, you specify the source columns to evaluate and the language of the text in that column. You can configure the processor to auto-detect the language in the column if you do not know the language or if a column contains more than one language.

Then, you specify the languages to translate to, and optional output columns.

For more information about the Translate function and other Snowflake LLM functions, see the Snowflake documentation.
Note: At this time, Snowflake charges differently for LLM processing. See the Snowflake consumption rates for details.

Example

Say you have product reviews in a reviews column, but they are in multiple languages. You want the LLM Translate processor to translate the reviews to English and overwrite the original reviews column with the results. To do this, you can configure the LLM Translate processor as follows:
  • Source Column: reviews
  • Source Language: AUTO_DETECT
  • Target Language: English
  • Output Column: not configured
With the following incoming data:
productId userId reviews
B-634 aabba6 Esto es lo mejor! No sé cómo viví sin él durante tanto tiempo.
FS-845 99louis Cest un peu cher, mais ça marche très bien.
S-212 boo55 non assomiglia per niente alla foto.
After processing the data, the processor passes the following data downstream:
productId userId reviews
B-634 aabba6 This is the best! I don't know how I lived without him for so long.
FS-845 99louis It's a bit expensive, but it works very well.
S-212 boo55 does not look like the photo at all.

Notice how the translations replace the original reviews. To keep the original reviews in addition to the translations, you can simply specify a new output column name in the Output Column property.

Source and Output Columns

Note the following details about defining source and output columns:
Source columns
The processor evaluates data in the columns defined in the Source Column property.
To evaluate multiple columns, you can define multiple sets of configurations. You can also use regular expressions to have the processor evaluate all columns with matching names.
Output columns
The processor writes translated data into the columns defined in the Output Column property. The processor creates columns and overwrites data in output columns as follows:
  • When you define an output column that does not exist in incoming data, the processor creates the column.
  • When you define an output column that exists, the processor overwrites the data in the column.
  • When you do not define an output column, the processor places the data in the column being evaluated, overwriting the original data.

    For example, if you configure the processor to translate a Feedback column and do not specify an output column, the processor places the translated text into the Feedback column.

Note: When specifying the output column, you can use $0 to represent the evaluated source column name, and then add preceding or following characters.
For example, say you use a regular expression to define the source columns to evaluate. If you specify $0_french as the name of the corresponding output columns, the processor writes the translations for a reviews source column to a new reviews_french column.

Configuring an LLM Translate Processor

Configure an LLM Translate processor to translate text in specified columns to another language.

  1. On the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Cache Data Caches processed data.
  2. On the Translate tab, configure the following property:
    Translate Property Description
    Translate Configurations Specify the following properties, as needed:
    • Source Column - Name of the column to evaluate. To evaluate multiple columns, you can use a regular expression to define a name pattern to match.

    • Source Language - Language in the specified source column. You can use AUTO_DETECT if you do not know the language or the column includes multiple languages.
    • Target Language - Language to translate data to.
    • Output Column - Output column for the generated summary. When not defined, the processor overwrites the associated source column. For information about defining multiple columns, see Source and Output Columns.

    To specify additional columns to evaluate, click Add Another or Bulk Edit Mode.