Groovy Scripting
Supported pipeline types:
|
- Create threads if supporting multithreaded processing
- Create batches
- Create records
- Add the records to a batch
- Process the batch
- Stop when the pipeline stops
The script must handle all necessary processing, such as generating events, sending errors for handling, and stopping when users stop the pipeline or when there is no more data. You can call external Java code from the script.
To handle restarts, the script must maintain an offset to track where the origin stopped and should restart. For the offset, the script requires a key, called an entity, associated with a unique value. For multithreaded processing, the entity must identify the partition of data processed by each thread. The method that processes batches saves an offset value for each entity.
For example, suppose your script processes data about U.S. states, using an API to read
data with a URL of the form ../<state>&page=<number>
. In the
script, each thread reads data from one state until finished with that state. You can
set the entity to the state and the offset to the page number.
You can reset the origin to process all available data.
The origin provides extensive sample code that you can use to develop your script.
When configuring the origin, you enter the script and the inputs required, including the batch size and number of threads, along with any script parameters used in the script.