Defining Grok Patterns

You can use the grok patterns in this appendix to define the structure of log data.

You can use a single pattern or compose several patterns to define a larger pattern, or create a custom pattern.

When you define grok patterns in a Data Collector stage, you configure the following properties:
Grok Pattern Definition
Use to define a complex or custom grok pattern. You can use this property to define a pattern for a single grok pattern or to define multiple patterns for use within a larger pattern.
When configuring the pattern definition, state the pattern name and then the pattern description as follows:
<PATTERN NAME> <grok pattern>
<PATTERN NAME2> <grok pattern>
The following example defines several patterns, MYHOSTTIMESTAMP, MYCUSTOMPATTERN which expands upon MYHOSTTIMESTAMP, and DURATIONLOG:
MYHOSTTIMESTAMP %{CISCOTIMESTAMP:timestamp} %{HOST:host}
MYCUSTOMPATTERN %{MYHOSTTIMESTAMP} %{WORD:program}%{NOTSPACE} %{NOTSPACE}
DURATIONLOG %{NUMBER:duration}%{NOTSPACE} %{GREEDYDATA:kernel_logs}
Grok Pattern
Defines the actual grok pattern used to evaluate data. You can enter a predefined grok pattern, such as %{COMMONAPACHELOG}. Or, to define a custom grok pattern, you can use the patterns listed in this appendix or the patterns that you defined in the Grok Pattern Description property.
For example, after defining the patterns above in the Grok Pattern Description property, you can use the patterns to configure the Grok Pattern property as follows:
%{MYCUSTOMPATTERN} %{DURATIONLOG}

The following image shows the configuration example in the UI:

For an example of how to use a grok pattern to parse Apache web logs, see the StreamSets blog post, What are Grok Patterns?