MongoDB Atlas Lookup
The MongoDB Atlas Lookup processor performs lookups in MongoDB Atlas or MongoDB Enterprise Server and passes all values from the returned document to a new list-map field in the record.
For information about supported versions, see Supported Systems and Versions.
Use the MongoDB Atlas Lookup processor to enrich records with additional data. For
example, you have multiple department documents in MongoDB that list the employees in
the department. You configure the processor to use the department_ID
field in the record to look up a department document, and pass all values from the
matching document to a new department_employees
field in the record.
When you configure the MongoDB Atlas Lookup processor, you define connection information, such as the connection string and MongoDB credentials. You can also use a connection to configure the processor. You configure the fields to look up and the field for the return values.
When a lookup results in multiple matched documents, the MongoDB Atlas Lookup processor can return values from the first matching document or return values from all matching documents in separate records.
To improve performance, you can configure the processor to locally cache the document values.
Field Mappings
When you configure the MongoDB Atlas Lookup processor, you define the document fields to look up in MongoDB. You map these document fields to fields in the record that contain the values to look up.
<embedded document>.<field name>.<embedded field name>
/<field name>
You can define multiple field mappings. The processor uses
the configured field mappings to generate and run a find()
query
in MongoDB.
After defining the field mappings, define a new list-map field to store all values from the returned document.
{
_id: 123,
customer: {
name: "Ed Martinez",
status: "gold",
phone: "123-456-7891",
location: {
city: "San Francisco",
state: "California"
}
}
}
Your pipeline reads from an origin that contains customer
names and cities, but you want to enrich that customer data with the customer
status and phone number. When you configure the processor, you map the
customer.name
and customer.location.city
document fields to the values stored in the name
and
city
fields in the record. To store the lookup result, you
define a new field named customer_details
. The following image
shows the configured field mappings and the result field:
When you run the pipeline, the processor uses the field
mappings to generate and run a find()
query in MongoDB. The
processor passes all values from the returned document to the new result
field.
Lookup Cache
To improve pipeline performance, you can configure the MongoDB Atlas Lookup processor to locally cache the document values returned from MongoDB.
The processor caches values until the cache reaches the maximum size or the expiration time. When the first limit is reached, the processor evicts values from the cache.
- Size-based eviction
- Configure the maximum number of values that the processor caches. When the maximum number is reached, the processor evicts the oldest values from the cache.
- Time-based eviction
- Configure the amount of time that a value can remain in the cache without being written to or accessed. When the expiration time is reached, the processor evicts the value from the cache. The eviction policy determines whether the processor measures the expiration time since the last write of the value or since the last access of the value.
When you stop the pipeline, the processor clears the cache.
Credentials
Based on the authentication used by MongoDB, configure the MongoDB Atlas Lookup processor to use no authentication, username/password authentication, or LDAP authentication. By default, no authentication is used.
- Authentication method
- Specify the authentication to use with the Authentication Method property on
the Credentials tab:
- None
- Username / Password
- LDAP
- Connection string
- If you prefer, you can specify credentials in the connection string on the Connection tab. However, specifying credentials on the Credentials tab is the recommended method.
Read Preference
You can configure the read preference that the MongoDB Atlas Lookup processor uses.
The read preference determines how the processor reads data from different members of the MongoDB replica set.
- Primary - Requires reading from the primary member.
- Primary Preferred - Prefers reading from the primary, but allows reads from a secondary member.
- Secondary - Requires reading from a secondary member.
- Secondary Preferred - Prefers reading from a secondary, but allows reads from a primary when necessary.
- Nearest - Reads from the member with the least network latency.
Configuring a MongoDB Atlas Lookup Processor
Configure a MongoDB Atlas Lookup processor to perform lookups in MongoDB Atlas.
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Required Fields Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses.Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline.
-
On the Connection tab,, configure the following
properties:
Connection Property Description Connection String Connection string for MongoDB. To connect to MongoDB Atlas or Enterprise Server, you can use the following DNS seed list format:mongodb+srv://server.example.com/
To connect to a MongoDB Enterprise Server cluster, use the following standard connection format:
mongodb://host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
When connecting to a cluster, enter additional node information to ensure a connection.
For more information about MongoDB connection strings, see the MongoDB documentation.
TLS Mode Method used to implement SSL/TLS: - None - Connects to a MongoDB Enterprise Server cluster that is not enabled to use SSL/TLS.
- Atlas/System CA - Connects to a MongoDB Atlas cluster. You can also use this when your certificates or keys have already been specified at the JVM level.
- Server Validation (1 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and does not need to prove client identity.
- Server and Client Validation (2 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and the server also validates the client key. This occurs when the cluster is set up to require client certificates.
-
On the Credentials tab, configure the following
properties:
Credentials Property Description Authentication Method Authentication method to use: - None
- Username / Password
- LDAP
Username User name for the selected authentication method. Password Password for the specified user name. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.Authentication Database Database name associated with the specified user account. Available when using username/password authentication.
Authentication Mechanism Authentication mechanism to use: - Default - Data Collector and MongoDB negotiate to choose the encryption mechanism.
- SCRAM-SHA-1 - Data Collector sends SCRAM-SHA-1 credentials to MongoDB.
- SCRAM-SHA-256 - Data Collector sends SCRAM-SHA-1 credentials to MongoDB.
-
On the MongoDB tab, configure the following
properties:
Lookup Property Description Database Name of the MongoDB database. Collection Name of the MongoDB collection to use. Document to SDC Field Mappings Result Field Name of the new list-map field in the record that receives all values from the returned document. Multiple Values Behavior Action to take upon finding multiple matching documents: - First value only - Generates a single record for the return values of the first matching document.
- Split into Multiple Records - Generates a separate record for the return values of every matching document.
Missing Values Behavior Action to take upon finding no document to return: - Send to error - Sends the record to error.
- Pass the record along the pipeline unchanged - Passes the record without a lookup return value.
Enable Local Caching Specifies whether to locally cache the returned values. Read Preference Determines how the origin reads data from different members of the MongoDB replica set. -
Optionally, click the Advanced tab to configure how the
origin connects to MongoDB.
The defaults for these properties should work in most cases. If a numeric property is set to 0, then the driver default value is used.
Advanced Property Description Compression Algorithm Compression algorithm to use to communicate with MongoDB: - None
- Snappy
- ZLib
- ZStandard
These compression algorithms are not supported by all MongoDB versions. See the MongoDB documentation for details.
Default is Snappy.
Application Name Name to use in MongoDB reporting, such as server logs. Maximum Connections Maximum number of open connections allowed in the connection pool. Minimum Connections Minimum number of open connections allowed in the connection pool. Max Connection Idle Time Maximum idle time in milliseconds before a connection is removed from the connection pool. Max Connection Lifetime Maximum lifetime in milliseconds for a connection in the connection pool. Max Connection Wait Time Maximum time in milliseconds that a connection waits to connect. Socket Connect Timeout Maximum time in milliseconds to wait for a network socket connection. Socket Read Timeout Maximum time in milliseconds to wait for a read connection. Socket Receive Buffer Size (bytes) Buffer size in bytes for receiving data. Socket Send Buffer Size (bytes) Buffer size in bytes for sending data. Heartbeat Frequency Milliseconds between Data Collector attempts to determine the current state of each server in the cluster. Min Heartbeat Frequency Minimum number of milliseconds between Data Collector checks on the state of each server. Server Selection Timeout Maximum time in milliseconds that Data Collector waits for server selection before throwing an exception. If you enter 0, an exception is thrown immediately if no server is available. Use a negative value to wait indefinitely. Local Threshold Local threshold in milliseconds. Requests are sent to a server whose ping time is less than or equal to the server with the fastest ping time plus the local threshold value. Required Replica Set Name Required replica set name to use for the cluster. Enable Single Mode Connects to the first MongoDB server in the connection string. Applicable only for MongoDB Enterprise Server clusters.
Max Number of Retries Maximum number of times to retry the connection when the connection fails. Default is 10.
Retry Interval (ms) Time between retries in milliseconds. Default is 10,000.