Kaitai Struct Parser

The Kaitai Struct Parser processor uses a Kaitai Struct format description file to parse binary data embedded in a field and passes the parsed data to the record. You can include the Kaitai Struct Parser processor in a pipeline with an origin configured to read binary data.

Kaitai Struct is a declarative language used to describe binary data structures. A .ksy format description file describes a binary file. You can develop a format description file that describes a binary file, or you can use an existing format description file for standard binary files. For more information, see the Katai Struct website.

The Kaitai Struct Parser processor compiles a format description file, calls the compiled file to parse binary data passed in incoming records, adds fields defined in the format description file to the records, inserts the values returned by the compiled file, and passes the record to the next stage. The processor temporarily stores the compiled file while the pipeline runs. The processor deletes the compiled file when the pipeline stops.

To configure the Kaitai Struct Parser processor, you specify the Kaitai Struct format description file. You can specify the format description file by specifying the file path or by pasting the content of the file directly into the stage.

If Data Collector uses Java 8, then you must complete a prerequisite task before running pipelines with the Kaitai Struct processor.

Warning: The Kaitai Struct Parser can introduce security risks. Format description files can contain malicious code. Cross-references in the code can include third-party code from around the globe. If you specify a file path to an untrusted source, malicious code could be introduced into the file between the time that you specify the file and the time that the processor loads and compiles the file. You must make sure the code is safe.

Example

For example, suppose you want the processor to parse the headers of .gif files. You provide the following Kaitai Struct format description:
meta:
  id: gif
  file-extension: gif
  endian: le
seq:
  - id: header
    type: header
  - id: logical_screen
    type: logical_screen
types:
  header:
    seq:
      - id: magic
        contents: 'GIF'
      - id: version
        size: 3
  logical_screen:
    seq:
      - id: image_width
        type: u2
      - id: image_height
        type: u2
      - id: flags
        type: u1
      - id: bg_color_index
        type: u1
      - id: pixel_aspect_ratio
        type: u1

An origin passes the binary data from a .gif file, and the processor generates the following record as output:

Prerequisite Task

If Data Collector uses Java 8, then Java Security Manager is enabled by default, and you must enable dynamic class loading from temporary folders.

For more information, see Security Manager for Java 8.

  1. In the Data Collector configuration directory, edit the security policy:
    $SDC_CONF/sdc-security.policy
  2. Add the following lines to the file:
    // dynamic class loading for Kaitai Struct
    grant codebase "file://${java.home}/../lib/tools.jar" {
       permission java.security.AllPermission;
    };
  3. Restart Data Collector.

Configuring the Kaitai Struct Parser Processor

Configure a Kaitai Struct Parser processor to parse binary data in a pipeline with an origin configured to read binary data.

If Data Collector uses Java 8, then you must complete a prerequisite task before running pipelines with the Kaitai Struct Parser processor.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Kaitai Struct tab, configure the following properties:
    Log Parser Properties Description
    Kaitai Struct Source Location of the Kaitai Struct format description:
    • In Specified .ksy File
    • In Property
    Kaitai Struct File Path Path and name of the .ksy file that contains the Kaitai format description to compile.

    Available if Kaitai Struct Source is In Specified .ksy File.

    Kaitai Struct Defintiion Kaitai Struct format description to compile.

    Paste the contents of a valid .ksy file.

    Available if Kaitai Struct Source is In Property.