MongoDB Atlas Lookup

The MongoDB Atlas Lookup processor performs lookups in MongoDB Atlas or MongoDB Enterprise Server and passes all values from the returned document to a new list-map field in the record.

Supported pipeline types:
  • Data Collector

For information about supported versions, see Supported Systems and Versions.

Use the MongoDB Atlas Lookup processor to enrich records with additional data. For example, you have multiple department documents in MongoDB that list the employees in the department. You configure the processor to use the department_ID field in the record to look up a department document, and pass all values from the matching document to a new department_employees field in the record.

When you configure the MongoDB Atlas Lookup processor, you define connection information, such as the connection string and MongoDB credentials. You configure the fields to look up and the field for the return values.

When a lookup results in multiple matched documents, the MongoDB Atlas Lookup processor can return values from the first matching document or return values from all matching documents in separate records.

To improve performance, you can configure the processor to locally cache the document values.

Field Mappings

When you configure the MongoDB Atlas Lookup processor, you define the document fields to look up in MongoDB. You map these document fields to fields in the record that contain the values to look up.

When you define a document field, use the dot notation to define a field in an embedded document as follows:
<embedded document>.<field name>.<embedded field name>
When you define a field in the record, reference the field as follows:
/<field name>

You can define multiple field mappings. The processor uses the configured field mappings to generate and run a find() query in MongoDB.

After defining the field mappings, define a new list-map field to store all values from the returned document.

For example, your MongoDB collection contains customer documents with the following structure:
{
  _id: 123,
  customer: {
       name: "Ed Martinez",
       status: "gold",
	phone: "123-456-7891",
	location: {
	  city: "San Francisco",
	  state: "California"
	}
  }
}

Your pipeline reads from an origin that contains customer names and cities, but you want to enrich that customer data with the customer status and phone number. When you configure the processor, you map the customer.name and customer.location.city document fields to the values stored in the name and city fields in the record. To store the lookup result, you define a new field named customer_details. The following image shows the configured field mappings and the result field:

When you run the pipeline, the processor uses the field mappings to generate and run a find() query in MongoDB. The processor passes all values from the returned document to the new result field.

Lookup Cache

To improve pipeline performance, you can configure the MongoDB Atlas Lookup processor to locally cache the document values returned from MongoDB.

The processor caches values until the cache reaches the maximum size or the expiration time. When the first limit is reached, the processor evicts values from the cache.

You can configure the following ways to evict values from the cache:
Size-based eviction
Configure the maximum number of values that the processor caches. When the maximum number is reached, the processor evicts the oldest values from the cache.
Time-based eviction
Configure the amount of time that a value can remain in the cache without being written to or accessed. When the expiration time is reached, the processor evicts the value from the cache. The eviction policy determines whether the processor measures the expiration time since the last write of the value or since the last access of the value.
For example, you set the eviction policy to expire after the last access and set the expiration time to 60 seconds. After the processor does not access a value for 60 seconds, the processor evicts the value from the cache.

When you stop the pipeline, the processor clears the cache.

Credentials

Based on the authentication used by MongoDB, configure the MongoDB Atlas Lookup processor to use no authentication, username/password authentication, or LDAP authentication. By default, no authentication is used.

To use username/password or LDAP authentication, enter the required credentials in one of the following ways:
Authentication method
Specify the authentication to use with the Authentication Method property on the Credentials tab:
  • None
  • Username / Password
  • LDAP
Then, define the username and password for username/password or LDAP authentication.
When using username/password authentication, you also specify the authentication mechanism to use. You can also specify an authentication database.
Connection string
If you prefer, you can specify credentials in the connection string on the Connection tab. However, specifying credentials on the Credentials tab is the recommended method.
To enter credentials for username/password authentication for self-managed clusters, enter the username and password before the host name. Use the following format:
mongodb://username:password@host[:port][/[database][?options]]
To enter credentials for MongoDB Atlas, specify the URL from your Atlas cluster settings.

Read Preference

You can configure the read preference that the MongoDB Atlas Lookup processor uses.

The read preference determines how the processor reads data from different members of the MongoDB replica set.

You can use the following MongoDB read preferences:
  • Primary - Requires reading from the primary member.
  • Primary Preferred - Prefers reading from the primary, but allows reads from a secondary member.
  • Secondary - Requires reading from a secondary member.
  • Secondary Preferred - Prefers reading from a secondary, but allows reads from a primary when necessary.
  • Nearest - Reads from the member with the least network latency.

Configuring a MongoDB Atlas Lookup Processor

Configure a MongoDB Atlas Lookup processor to perform lookups in MongoDB Atlas.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Required Fields Fields that must include data for the record to be passed into the stage.
    Tip: You might include fields that the stage uses.

    Records that do not include all required fields are processed based on the error handling configured for the pipeline.

    Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions.

    Records that do not meet all preconditions are processed based on the error handling configured for the stage.

    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.
  2. On the Connection tab,, configure the following properties:
    Connection Property Description
    Connection String
    Connection string for MongoDB. To connect to MongoDB Atlas or Enterprise Server, you can use the following DNS seed list format:
    mongodb+srv://server.example.com/

    To connect to a MongoDB Enterprise Server cluster, use the following standard connection format:

    mongodb://host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]

    When connecting to a cluster, enter additional node information to ensure a connection.

    For more information about MongoDB connection strings, see the MongoDB documentation.

    TLS Mode Method used to implement SSL/TLS:
    • None - Connects to a MongoDB Enterprise Server cluster that is not enabled to use SSL/TLS.
    • Atlas/System CA - Connects to a MongoDB Atlas cluster. You can also use this when your certificates or keys have already been specified at the JVM level.
    • Server Validation (1 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and does not need to prove client identity.
    • Server and Client Validation (2 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and the server also validates the client key. This occurs when the cluster is set up to require client certificates.
  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Authentication Method Authentication method to use:
    • None
    • Username / Password
    • LDAP
    Username User name for the selected authentication method.
    Password Password for the specified user name.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
    Authentication Database Database name associated with the specified user account.

    Available when using username/password authentication.

    Authentication Mechanism Authentication mechanism to use:
    • Default - Data Collector and MongoDB negotiate to choose the encryption mechanism.
    • SCRAM-SHA-1 - Data Collector sends SCRAM-SHA-1 credentials to MongoDB.
    • SCRAM-SHA-256 - Data Collector sends SCRAM-SHA-1 credentials to MongoDB.
  4. On the MongoDB tab, configure the following properties:
    Lookup Property Description
    Database Name of the MongoDB database.
    Collection Name of the MongoDB collection to use.
    Document to SDC Field Mappings
    Result Field Name of the new list-map field in the record that receives all values from the returned document.
    Multiple Values Behavior Action to take upon finding multiple matching documents:
    • First value only - Generates a single record for the return values of the first matching document.
    • Split into Multiple Records - Generates a separate record for the return values of every matching document.
    Missing Values Behavior Action to take upon finding no document to return:
    • Send to error - Sends the record to error.
    • Pass the record along the pipeline unchanged - Passes the record without a lookup return value.
    Enable Local Caching Specifies whether to locally cache the returned values.
    Read Preference Determines how the origin reads data from different members of the MongoDB replica set.
  5. Optionally, click the Advanced tab to configure how the origin connects to MongoDB.
    The defaults for these properties should work in most cases. If a numeric property is set to 0, then the driver default value is used.
    Advanced Property Description
    Compression Algorithm Compression algorithm to use to communicate with MongoDB:
    • None
    • Snappy
    • ZLib
    • ZStandard

    These compression algorithms are not supported by all MongoDB versions. See the MongoDB documentation for details.

    Default is Snappy.

    Application Name Name to use in MongoDB reporting, such as server logs.
    Maximum Connections Maximum number of open connections allowed in the connection pool.
    Minimum Connections Minimum number of open connections allowed in the connection pool.
    Max Connection Idle Time Maximum idle time in milliseconds before a connection is removed from the connection pool.
    Max Connection Lifetime Maximum lifetime in milliseconds for a connection in the connection pool.
    Max Connection Wait Time Maximum time in milliseconds that a connection waits to connect.
    Socket Connect Timeout Maximum time in milliseconds to wait for a network socket connection.
    Socket Read Timeout Maximum time in milliseconds to wait for a read connection.
    Socket Receive Buffer Size (bytes) Buffer size in bytes for receiving data.
    Socket Send Buffer Size (bytes) Buffer size in bytes for sending data.
    Heartbeat Frequency Milliseconds between Data Collector attempts to determine the current state of each server in the cluster.
    Min Heartbeat Frequency Minimum number of milliseconds between Data Collector checks on the state of each server.
    Server Selection Timeout Maximum time in milliseconds that Data Collector waits for server selection before throwing an exception. If you enter 0, an exception is thrown immediately if no server is available. Use a negative value to wait indefinitely.
    Local Threshold Local threshold in milliseconds. Requests are sent to a server whose ping time is less than or equal to the server with the fastest ping time plus the local threshold value.
    Required Replica Set Name Required replica set name to use for the cluster.
    Enable Single Mode Connects to the first MongoDB server in the connection string.

    Applicable only for MongoDB Enterprise Server clusters.

    Max Number of Retries Maximum number of times to retry the connection when the connection fails.

    Default is 10.

    Retry Interval (ms) Time between retries in milliseconds.

    Default is 10,000.