MongoDB Atlas

Supported pipeline types:
  • Data Collector

The MongoDB Atlas origin reads data from MongoDB Atlas or MongoDB Enterprise Server. For information about supported versions, see Supported Systems and Versions.

The MongoDB Atlas origin reads from capped and uncapped collections. When you configure the origin, you define connection information, such as the connection string and credentials to use. You can specify SSL/TLS properties for an SSL/TLS-enabled MongoDB cluster.

You configure the database, collection, offset details, and read preference. You can define a custom filter and configure the origin to flatten nested structures.

You can optionally configure advanced options that determine how the origin connects to MongoDB, such as the maximum number of open connections to allow in the connection pool and the cursor type to use for capped connections.

When the pipeline stops, the origin notes where it stops reading. When the pipeline starts again, the origin continues processing from the last-saved offset by default. You can reset the origin to process all available data.

The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Credentials

Based on the authentication used by MongoDB, configure the MongoDB Atlas origin to use no authentication, username/password authentication, or LDAP authentication. By default, no authentication is used.

To use username/password or LDAP authentication, enter the required credentials in one of the following ways:
Authentication method
Specify the authentication to use with the Authentication Method property on the Credentials tab:
  • None
  • Username / Password
  • LDAP
Then, define the username and password for username/password or LDAP authentication.
When using username/password authentication, you also specify the authentication mechanism to use. You can also specify an authentication database.
Connection string
If you prefer, you can specify credentials in the connection string on the Connection tab. However, specifying credentials on the Credentials tab is the recommended method.
To enter credentials for username/password authentication for self-managed clusters, enter the username and password before the host name. Use the following format:
mongodb://username:password@host[:port][/[database][?options]]
To enter credentials for MongoDB Atlas, specify the URL from your Atlas cluster settings.

Offset Field and Initial Offset

MongoDB uses the offset field to track the data to read. By default, the MongoDB Atlas origin uses the _id field as the offset field.

You can use any field type as the offset field. The origin determines the type of the field based on the first record in the collection. If you do not use the default_id field, results are not guaranteed.

When you use a date or Object ID field, specify a timestamp to use as the initial offset. Object ID fields include an embedded timestamp that the origin uses to determine where in the collection to begin reading. When you define the initial offset for a date or Object ID field, use one of the following formats:
Hexadecimal string
Use a hexadecimal string representation of the Object ID, such as 62193d7cf7e3300b6646bdc8. This is available when viewing collection documents using MongoDB Compass.
Datetime
Use the following datetime format:
YYYY-MM-DD HH:mm:ss

When you use a string field, specify the initial string to use as the initial offset.

Note: If you change the offset field for the origin after the pipeline runs and then stops, you must reset the origin before you can run the pipeline again.

Specifying Field Paths

When configuring the MongoDB Atlas origin, you can specify field paths in either of the following ways:
  • Data Collector format - Uses a slash ( / ) as a delimiter. Includes a leading slash.
  • MongoDB format - Uses a period ( . ) as a delimiter.
The following table lists examples of the two field path formats:
Data Collector Format MondoDB Format
/_id _id
/orders/address/line1 orders.address.line1
/orders/lines[1]/quantity orders.lines[1].quantity

Read Preference

You can configure the read preference that the MongoDB Atlas origin uses. The read preference determines how the origin reads data from different members of the MongoDB replica set.

You can use the following MongoDB read preferences:
  • Primary - Requires reading from the primary member.
  • Primary Preferred - Prefers reading from the primary, but allows reads from a secondary member.
  • Secondary - Requires reading from a secondary member.
  • Secondary Preferred - Prefers reading from a secondary, but allows reads from a primary when necessary.
  • Nearest - Reads from the member with the least network latency.

By default, the origin uses Secondary Preferred to avoid making unnecessary requests to the primary member.

Custom Filter

You can specify a custom filter to reduce the data that MongoDB passes to the origin to process. Use a custom filter to return a subset of all available data before processing. Use the MongoDB query operator syntax for filters when you define a custom filter.

For example, to retrieve only documents where city is San Francisco, you can use the following custom filter:
{ city: “San Francisco” }

For more information, including the appropriate syntax for query operators, see the MongoDB Atlas documentation.

Event Generation

The MongoDB Atlas origin can generate events when it completes processing all available data and the configured batch wait time has elapsed.

MongoDB Atlas origin events can be used in any logical way. For example:
  • With the Pipeline Finisher executor to stop the pipeline and transition the pipeline to a Finished state when the origin completes processing available data.

    When you restart a pipeline stopped by the Pipeline Finisher executor, the origin continues processing from the last-saved offset unless you reset the origin.

    For an example, see Stopping a Pipeline After Processing All Available Data.

  • With a destination to store event information.

    For an example, see Preserving an Audit Trail of Events.

For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Event Records

Event records generated by the MongoDB Atlas origin have the following event-related record header attributes. Record header attributes are stored as string values:
Record Header Attribute Description
sdc.event.type Event type. Uses the following event type:
  • no-more-data - Generated after the origin completes processing all available data. The exact behavior differs based on the collection and cursor type..
sdc.event.version Integer that indicates the version of the event record type.
sdc.event.creation_timestamp Epoch timestamp when the stage created the event.
The MongoDB Atlas origin can generate the following event record:
no-more-data
The MongoDB Atlas origin generates a no-more-data event record differently depending on the collection and cursor type:
  • When the collection type is capped and the origin uses a tailable cursor type, the origin generates the event record after processing all available records, and the number of seconds specified for Max Batch Wait Time elapses without any new data appearing.
  • When the collection type is not capped or when the origin uses a different cursor type for capped collections, the origin generates the event record immediately after reading all available data.
No-more-data event records generated by the origin have the sdc.event.type set to no-more-data and include the following fields:
Event Record Field Description
record-count Number of records successfully generated since the pipeline started or since the last no-more-data event was created.
error-count Number of error records generated since the pipeline started or since the last no-more-data event was created.

Enabling SSL/TLS

By default, the MongoDB Atlas origin does not use SSL/TLS. If the cluster is enabled to use SSL/TLS, then you can connect using one of the following methods:
  • Atlas/System CA - Connects to a MongoDB Atlas cluster. You can also use this when your certificates or keys have already been specified at the JVM level.
  • Server Validation (1 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and does not need to prove client identity.
  • Server and Client Validation (2 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and the server also validates the client key. This occurs when the cluster is set up to require client certificates.
Note: Server validation and server and client validation require configuring additional properties that provide the required information. Both options require obtaining the certificate file for the cluster in one of the valid formats. Server and client validation also requires generating or obtaining the public certificate and private key file for Data Collector.
You can specify certificates and keys in the following formats:
  • JKS (Java Keystore)
  • PEM (text-based)
  • DER (text-based)
  • PKCS #7 / P7B
  • PKCS #12 / P12 / PFX
  • Private keys inside PEM, DER, or PKCS #12 encoded as PKCS#1 or PKCS#8

If the files are in PEM or DER plain text format, you can provide the text in the stage properties. The certificate should begin and end with text such as: —BEGIN CERTIFICATE— or —END PRIVATE KEY—. Otherwise, you provide a path to the certificate file.

MongoDB Data Types

When the MongoDB Atlas origin reads from MongoDB, it converts standard MongoDB data types to the following Data Collector data types.

The origin can also convert supported BSON types to Data Collector data types. For more information, see Reading BSON Types.

The following table describes how standard MongoDB data types are converted to Data Collector types:
Standard MongoDB Type Data Collector Type
Array List
Binary Byte Array
Boolean Boolean
Date Date
Double Double
Int32 Integer
Int64 Long
JavaScript String
Object List-Map
String String
Timestamp Datetime

Reading BSON Types

When reading from MongoDB, the MongoDB Atlas origin converts standard MongoDB data types to Data Collector data types as described in MongoDB Data Types.

The origin converts supported BSON data types to Data Collector data types as well. When converting BSON data types, the origin adds a field attribute named bsonType to the converted field.

Some supported BSON data types encode additional information with the data. Where this occurs, the information is included as additional attributes for the field. For example, a BsonTimestamp can encode an ordinal value along with the date and time. When the origin reads the data, it converts the field to a Datetime field with an ordinal field attribute set to the ordinal value encoded with the data.

The following table lists supported MongoDB BSON data types, the Data Collector types they convert to, and any additional field attributes included with the conversion:
BSON Data Type Data Collector Type Field Attributes and Values
Binary Byte Array bsonType: Binary
BsonDbPointer Map field with the following subfields:
  • database
  • collection
  • id - String containing the 24-character hexadecimal value of the ID
bsonType: Bson_Db_Pointer
BsonRegularExpression String
  • bsonType: Bson_Regular_Expression
  • options: Options provided with the regular expression
BsonTimestamp Datetime
  • bsonType: Bson_Timestamp2
  • seconds: Unix epoch time in seconds
  • ordinal: Ordinal value encoded in the timestamp

Code String bsonType: Code
CodeWithScope String bsonType: Code_With_Scope
DBRef Map field with the following subfields:
  • database
  • collection
  • id - String containing the 24-character hexadecimal value of the ID
bsonType: Db_Ref
Decimal128 Decimal bsonType: Decimal128
Null String with null value bsonType: Null
ObjectId String containing the 24-character hexadecimal value of the Object Id
  • bsonType: Object_Id
  • timestamp: Unix epoch time of the Object Id
  • date: Date represented by the Object Id in yyyy-MM-dd HH:mm:ss format

Symbol String bsonType: Symbol
Undefined String with null value bsonType: Undefined
Note: The MinKey and MaxKey BSON data types are not supported at this time.

Configuring a MongoDB Atlas Origin

Configure a MongoDB Atlas origin to read data from MongoDB Atlas or MongoDB Enterprise Server.

  1. In the Properties panel, on the General tab, configure the following properties:
    General Property Description
    Name Stage name.
    Description Optional description.
    Produce Events Generates event records when events occur. Use for event handling.
    On Record Error Error record handling for the stage:
    • Discard - Discards the record.
    • Send to Error - Sends the record to the pipeline for error handling.
    • Stop Pipeline - Stops the pipeline.
  2. On the Connection tab, configure the following properties:
    Connection Property Description
    Connection String
    Connection string for MongoDB. To connect to MongoDB Atlas or Enterprise Server, you can use the following DNS seed list format:
    mongodb+srv://server.example.com/

    To connect to a MongoDB Enterprise Server cluster, use the following standard connection format:

    mongodb://host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]

    When connecting to a cluster, enter additional node information to ensure a connection.

    For more information about MongoDB connection strings, see the MongoDB documentation.

    SSL/TLS Mode Method used to implement SSL/TLS:
    • None - Connects to a MongoDB Enterprise Server cluster that is not enabled to use SSL/TLS.
    • Atlas/System CA - Connects to a MongoDB Atlas cluster. You can also use this when your certificates or keys have already been specified at the JVM level.
    • Server Validation (1 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and does not need to prove client identity.
    • Server and Client Validation (2 Way TLS) - Connects to an SSL/TLS-enabled MongoDB Enterprise Server cluster when the client needs to validate the server certificate and the server also validates the client key. This occurs when the cluster is set up to require client certificates.
    SSL Invalid Host Name Allowed Specifies whether invalid host names are allowed in SSL/TLS certificates.

    Available when using server validation or server and client validation.

    Certificate Mode Mode to provide the SSL/TLS certificate:
    • File - Use when the certificate is in a file local to Data Collector.
    • Embedded - Use to provide the certificate text directly in stage properties.

    Available when using server validation or server and client validation.

    Certificate Authority MongoDB certificate to use. Define this property based on the configured certificate mode:
    • When using file certificate mode, specify a path to the certificate. Enter an absolute path to the file or enter the following expression to define the file stored in the Data Collector resources directory:

      ${runtime:resourcesDirPath()}/keystore.jks

    • When using the embedded certificate mode, provide the full text of the certificate to use. The text should start with ---BEGIN CERTIFICATE---.

    Available when using server validation or server and client validation.

    Certificate Authority Password Password for the certificate. Specify if the certificate file is encrypted.

    Available when using server validation or server and client validation, and when using file certificate mode.

    Client Certificate Client certificate to use. Define this property based on the configured certificate mode:
    • When using file certificate mode, specify a path to the certificate. Enter an absolute path to the file or enter the following expression to define the file stored in the Data Collector resources directory:

      ${runtime:resourcesDirPath()}/keystore.jks

    • When using the embedded certificate mode, provide the full text of the certificate to use. The text should start with ---BEGIN CERTIFICATE---.

    Available when using server and client validation.

    Client Private Key Path to the key file.

    Available when using server and client validation and file certificate mode.

    Private Key Password Password for the private key. Specify if the private key is encrypted.

    Available when using server and client validation and file certificate mode.

  3. On the Credentials tab, configure the following properties:
    Credentials Property Description
    Authentication Method Authentication method to use:
    • None
    • Username / Password
    • LDAP
    Username User name for the selected authentication method.
    Password Password for the specified user name.
    Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.
    Authentication Database Database name associated with the specified user account.

    Available when using username/password authentication.

    Authentication Mechanism Authentication mechanism to use:
    • Default - Data Collector and MongoDB negotiate to choose the encryption mechanism.
    • SCRAM-SHA-1 - Data Collector sends SCRAM-SHA-1 credentials to MongoDB.
    • SCRAM-SHA-256 - Data Collector sends SCRAM-SHA-1 credentials to MongoDB.
  4. On the MongoDB tab, configure the following properties:
    MongoDB Property Description
    Database Name of the MongoDB database.
    Collection Name of the MongoDB collection to use.
    Custom Filter Optional filter to include when querying the collection.
    Offset Field Field to use to track reads. Default is the _id field.

    For information about specifying fields, see Specifying Field Paths.

    Initial Offset Initial offset to use to begin reading. When using a date or Object ID field as the offset field, enter a timestamp with the following format or a hexadecimal string representation of the Object ID:

    YYYY-MM-DD hh:mm:ss

    When using a string field, enter the string to use.

    Read Preference Determines how the origin reads data from different members of the MongoDB replica set.
    Auto Flatten Nested Structures Flattens fields with nested fields. Includes the path to the field in the field name, such as:

    <firstlevel>.<secondlevel>.<fieldname>

    Flattened arrays include the index in the field name, as follows:

    root.array[0].field1

    Batch Size (records) Maximum number of records allowed in a batch.
    Max Batch Wait Time Maximum seconds the origin waits for a batch before sending an empty batch.

    Used only when the Capped Collection Cursor Type property is set to Tailable.

  5. Optionally, click the Advanced tab to configure how the origin connects to MongoDB.

    The defaults for these properties should work in most cases. If a numeric property is set to 0, then the driver default value is used.

    Advanced Property Description
    Compression Algorithm Compression algorithm to use to communicate with MongoDB:
    • None
    • Snappy
    • ZLib
    • ZStandard

    These compression algorithms are not supported by all MongoDB versions. See the MongoDB documentation for details.

    Default is Snappy.

    Application Name Name to use in MongoDB reporting, such as server logs.
    Maximum Connections Maximum number of open connections allowed in the connection pool.
    Minimum Connections Minimum number of open connections allowed in the connection pool.
    Max Connection Idle Time Maximum idle time in milliseconds before a connection is removed from the connection pool.
    Max Connection Lifetime Maximum lifetime in milliseconds for a connection in the connection pool.
    Max Connection Wait Time Maximum time in milliseconds that a connection waits to connect.
    Socket Connect Timeout Maximum time in milliseconds to wait for a network socket connection.
    Socket Read Timeout Maximum time in milliseconds to wait for a read connection.
    Socket Receive Buffer Size (bytes) Buffer size in bytes for receiving data.
    Socket Send Buffer Size (bytes) Buffer size in bytes for sending data.
    Heartbeat Frequency Milliseconds between Data Collector attempts to determine the current state of each server in the cluster.
    Min Heartbeat Frequency Minimum number of milliseconds between Data Collector checks on the state of each server.
    Server Selection Timeout Maximum time in milliseconds that Data Collector waits for server selection before throwing an exception. If you enter 0, an exception is thrown immediately if no server is available. Use a negative value to wait indefinitely.
    Local Threshold Local threshold in milliseconds. Requests are sent to a server whose ping time is less than or equal to the server with the fastest ping time plus the local threshold value.
    Required Replica Set Name Required replica set name to use for the cluster.
    Enable Single Mode Connects to the first MongoDB server in the connection string.

    Applicable only for MongoDB Enterprise Server clusters.

    Max Number of Retries Maximum number of times to retry the connection when the connection fails.

    Default is 10.

    Retry Interval (ms) Time between retries in milliseconds.

    Default is 10,000.

    Capped Collection Cursor Type Style of cursor to use for a capped collection:
    • Normal
    • Tailable
    • Tailable Await