Log Parser

Supported pipeline types:
  • Data Collector

The Log Parser processor parses log data in a field based on the specified log format. Use the Log Parser to process log data within the pipeline. To read log data directly from an origin system, you can use an origin that processes the log data format, such as File Tail or Kafka Multitopic Consumer.

When you configure Log Parser, you define the field that contains the log line and the field to contain the parsed fields.

You also define the format of the log data to be read, the maximum line length, and the character set of the data. You can configure the processor to retain the original line of the log and to ignore control characters.

If the record contains fields in addition to the field to be parsed, those fields are passed through by default. Parsed fields are written to the specified location, overwriting any existing data.

Log Formats

When you use Log Parser to parse log data, you define the format of the log files to be read.

You can use the following log formats:

Common Log Format
A standardized text format used by web servers to generate log files. Also known as the NCSA (National Center for Supercomputing Applications) Common Log format.
Combined Log Format
A standardized text format based on the common log format that includes additional information. Also known as the Apache/NCSA Combined Log Format.
Apache Error Log Format
The standardized error log format generated by the Apache HTTP Server 2.2.
Apache Access Log Custom Format
A customizable access log generated by the Apache HTTP Server 2.2. Use the Apache HTTP Server version 2.2 syntax to define the format of the log file.
Regular Expression
Use a regular expression to define the structure of log data, and then assign the field or fields represented by each group.
Use any valid regular expression.
Grok Pattern
Use a grok pattern to define the structure of log data. You can use the grok patterns supported by the Data Collector. You can also define a custom grok pattern and then use it as part of the log format.
For more information about supported grok patterns, see Defining Grok Patterns.
log4j
A customizable format generated by the Apache Log4j 1.2 logging utility. You can use the default format or specify a custom format. Use the Apache Log4j version 1.2 syntax to define the format of the log file.

Configuring a Log Parser Processor

Configure a Log Parser to parse log data in a field.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Parse tab, configure the following properties:
    Log Parser Properties Description
    Field to Parse Field path that contains the log data to parse.
    New Parsed Field Field to act as the root field for the newly parsed fields.
  3. On the Data Format tab, configure the following properties:
    Data Format Properties Description
    Log Format Format of log data. Use one of the following formats:
    • Common Log Format
    • Combined Log Format
    • Apache Error Log Format
    • Apache Access Log Custom Format
    • Regular Expression
    • Grok Pattern
    • Log4j
    • Common Event Format (CEF)
    • Log Event Extended Format (LEEF)
    Max Line Length Maximum length of a log line. The processor truncates longer lines.

    This property can be limited by the Data Collector parser buffer size. For more information, see Maximum Record Size.

    Retain Original Line Determines how to treat the original log line. Select to include the original log line as a field in the resulting record.

    By default, the original line is discarded.

    Charset Character encoding of the data to be processed.
    Ignore Control Characters Removes all ASCII control characters except for the tab, line feed, and carriage return characters.
    • When you select Apache Access Log Custom Format, use Apache log format strings to define the Custom Log Format.
    • When you select Regular Expression, enter the regular expression that describes the log format, and then map the fields that you want to include to each regular expression group.
    • When you select Grok Pattern, you can define any custom grok patterns that you want to use in the Grok Pattern Definition field, and then enter the grok pattern log file description in the Grok Pattern field.

      For more information about supported grok patterns, see Defining Grok Patterns.

    • When you select Log4j, you can use log4j variables to define a custom log format.