XML Parser

The XML Parser parses a well-formed XML document embedded in a string field and passes the parsed data to an output field in the record.

When you configure the XML Parser, you specify the field that contains the XML document and the target field for the parsed results. You can define a delimiter element to separate the document into multiple values. When no delimiter element is defined, XML Parser passes the entire document to the target field as a map.

When defining the delimiter element, you can use an XML element or simplified XPath expression. Use an XML element when the element resides directly under the root node. Use a simplified XPath expression to access data deeper in the XML document.

When an XML document has more than one value, you can return the first value, all values as a list, or generate a record for each value in the document.

When generating a record, the processor includes all other incoming fields in the generated record. When generating multiple records because of multiple values in the parsed field, the processor includes the other incoming fields for each generated record.

You can configure the processor to include the XPath to each parsed XML element and XML attribute in field attributes. This also places each namespace in an xmlns record header attribute.

You can also configure the processor to include XML attributes and namespace declarations in the record as a field attributes. By default, it includes XML attributes and namespace declarations in the record as fields.
Note: Field attributes and record header attributes are written to destination systems automatically only when you use the SDC RPC data format in destinations. For more information about working with field attributes and record header attributes, and how to include them in records, see Field Attributes and Record Header Attributes.

For more information about how XML Parser processes XML data, see Reading and Processing XML Data.

Configuring an XML Parser Processor

Configure an XML Parser to parse XML data in a string field.

When you configure an XML Parser, specify the field to parse and the output field to use.
  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Parse tab, configure the following properties:
    Parse Property Description
    Field to Parse String field that contains XML data.
    Delimiter Element Delimiter to use to process data and generate multiple records. Omit a delimiter to treat the entire XML document as one record.
    Use one of the following as a delimiter element:
    • An XML element directly under the root element.

      Use the XML element name without surrounding angle brackets ( < > ) . For example, msg instead of <msg>.

    • A simplified XPath expression that specifies the data to use.

      Use a simplified XPath expression to access data deeper in the XML document or data that requires a more complex access method.

      For more information about valid syntax, see Simplified XPath Syntax.

    Output Field Attributes Includes XML attributes and namespace declarations in the record as field attributes. When not selected, XML attributes and namespace declarations are included in the record as fields.
    Note: Field attributes are automatically included in records written to destination systems only when you use the SDC RPC data format in the destination. For more information about working with field attributes, see Field Attributes.

    By default, the property is not selected.

    Target Field

    Output field for the parsed XML data.

    You can specify the same field to replace the original data with the parsed data. Or you can specify another existing field or a new field. If the field does not exist, XML Parser creates the field.

    Charset

    Character encoding of the data to be processed.

    Ignore Control Characters Removes all ASCII control characters except for the tab, line feed, and carriage return characters.
    Include Field XPaths Includes the XPath to each parsed XML element and XML attribute in field attributes. Also includes each namespace in an xmlns record header attribute.

    When not selected, this information is not included in the record. By default, the property is not selected.

    Note: Field attributes and record header attributes are written to destination systems automatically only when you use the SDC RPC data format in destinations. For more information about working with field attributes and record header attributes, and how to include them in records, see Field Attributes and Record Header Attributes.
    Namespaces Namespace prefix and URI to use when parsing the XML document. Define namespaces when the XML element being used includes a namespace prefix or when the XPath expression includes namespaces.

    For information about using namespaces with an XML element, see Using XML Elements with Namespaces.

    For information about using namespaces with XPath expressions, see Using XPath Expressions with Namespaces.

    Using simple or bulk edit mode, click the Add icon to add additional namespaces.

    Multiple Values Behavior Action to take when the data in the field includes multiple values:
    • First Value Only - Returns the first value.
    • All Values as a List - Returns all values as items in a List field.
    • Split into Multiple Records - Returns each value in a separate record. This option generates multiple records, one for each parsed value from the XML document, based on the delimiter element. Other fields in the record are retained with each record.