Origins
An origin stage represents the source for the pipeline. You can use a single origin stage in a pipeline.
You can use different origins based on the execution mode of the pipeline: standalone, cluster, or edge. To help create or test pipelines, you can use development origins.
Standalone Pipelines
In standalone pipelines, you can use the following origins:
- Amazon S3 - Reads objects from Amazon S3. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Amazon SQS Consumer - Reads data from queues in Amazon Simple Queue Services (SQS). Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Azure Data Lake Storage Gen1 (deprecated) - Reads data from Microsoft Azure Data Lake Storage Gen1. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Azure Data Lake Storage Gen2 - Reads data from Microsoft Azure Data Lake Storage Gen2. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Azure IoT/Event Hub Consumer - Reads data from Microsoft Azure Event Hub. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- CoAP Server - Listens on a CoAP endpoint and processes the contents of all authorized CoAP requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Cron Scheduler - Generates a record with the current datetime as scheduled by a cron expression. This is an orchestration stage.
- Directory - Reads fully-written files from a directory. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Elasticsearch - Reads data from an Elasticsearch cluster. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- File Tail - Reads lines of data from an active file after reading related archived files in the directory.
- Google BigQuery - Executes a query job and reads the result from Google BigQuery.
- Google Cloud Storage - Reads fully written objects from Google Cloud Storage.
- Google Pub/Sub Subscriber - Consumes messages from a Google Pub/Sub subscription. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Groovy Scripting - Runs a Groovy script to create Data Collector records. Can create multiple threads to enable parallel processing in a multithreaded pipeline.
- Hadoop FS Standalone - Reads fully-written files from HDFS or Azure Blob storage. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- HTTP Client - Reads data from a streaming HTTP resource URL.
- HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- JavaScript Scripting - Runs a JavaScript script to create Data Collector records. Can create multiple threads to enable parallel processing in a multithreaded pipeline.
- JDBC Multitable Consumer - Reads database data from multiple tables through a JDBC connection. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- JDBC Query Consumer - Reads database data using a user-defined SQL query through a JDBC connection.
- JMS Consumer - Reads messages from JMS.
- Jython Scripting - Runs a Jython script to create Data Collector records. Can create multiple threads to enable parallel processing in a multithreaded pipeline.
- Kafka Consumer (deprecated) - Reads messages from a single Kafka topic.
- Kafka Multitopic Consumer - Reads messages from multiple Kafka topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Kinesis Consumer - Reads data from Kinesis Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR DB CDC - Reads changed MapR DB data that has been written to MapR Streams. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR DB JSON - Reads JSON documents from MapR DB JSON tables.
- MapR FS Standalone - Reads fully-written files from MapR FS. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR Multitopic Streams Consumer - Reads messages from multiple MapR Streams topics. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- MapR Streams Consumer - Reads messages from MapR Streams.
- MongoDB - Reads documents from MongoDB.
- MongoDB Oplog - Reads entries from a MongoDB Oplog.
- MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
- MySQL Binary Log - Reads MySQL binary logs to generate change data capture records.
- NiFi HTTP Server (deprecated) - Listens for requests from a NiFi PutHTTP processor and processes NiFi FlowFiles.
- Omniture (deprecated) - Reads web usage reports from the Omniture reporting API.
- OPC UA Client - Reads data from a OPC UA server.
- Oracle Bulkload - Reads data from multiple Oracle database tables, then stops the pipeline. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Oracle CDC Client - Reads LogMiner redo logs to generate change data capture records.
- PostgreSQL CDC Client - Reads PostgreSQL WAL data to generate change data capture records.
- Pulsar Consumer - Reads messages from Apache Pulsar topics.
- RabbitMQ Consumer - Reads messages from RabbitMQ.
- Redis Consumer - Reads messages from Redis.
- REST Service - Listens on an HTTP endpoint, parses the contents of all authorized requests, and sends responses back to the originating REST API. Creates multiple threads to enable parallel processing in a multithreaded pipeline. Use only in microservice pipelines.
- Salesforce - Reads data from Salesforce.
- SAP HANA Query Consumer - Reads data from an SAP HANA database using a user-defined SQL query.
- SDC RPC (deprecated) - Reads data from an SDC RPC destination in an SDC RPC pipeline.
- SFTP/FTP/FTPS Client - Reads files from an SFTP, FTP, or FTPS server.
- SQL Server 2019 BDC Multitable Consumer - Reads data from Microsoft SQL Server 2019 Big Data Cluster (BDC) through a JDBC connection. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- SQL Server CDC Client - Reads data from Microsoft SQL Server CDC tables. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- SQL Server Change Tracking - Reads data from Microsoft SQL Server change tracking tables and generates the latest version of each record. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Start Jobs - Starts one or more Control Hub jobs in parallel. This is an orchestration stage.
- Start Pipelines (deprecated) - Starts one or more pipelines in parallel. This is an orchestration stage.
- TCP Server - Listens at the specified ports and processes incoming data over TCP/IP connections. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- Teradata Consumer (deprecated) - Reads data from Teradata Database tables through a JDBC connection. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- UDP Multithreaded Source - Reads messages from one or more UDP ports. Creates multiple threads to enable parallel processing in a multithreaded pipeline.
- UDP Source - Reads messages from one or more UDP ports.
- WebSocket Client - Reads data from a WebSocket server endpoint. Can send responses back to the origin system as part of a microservice pipeline.
- WebSocket Server - Listens on a WebSocket endpoint and processes the contents of all authorized WebSocket client requests. Creates multiple threads to enable parallel processing in a multithreaded pipeline. Can send responses back to the origin system as part of a microservice pipeline.
Cluster Pipelines (Deprecated)
In cluster
pipelines, you can use the following origins:
- Hadoop FS (deprecated) - Reads data from HDFS, Amazon S3, or other file systems using the Hadoop FileSystem interface.
- Kafka Consumer (deprecated) - Reads messages from Kafka. Use the cluster version of the origin.
- MapR FS (deprecated) - Reads data from MapR FS.
Edge Pipelines
In edge pipelines, you can use the following origins:
- Directory - Reads fully-written files from a directory.
- File Tail - Reads lines of data from an active file after reading related archived files in the directory.
- gRPC Client - Reads data from a gRPC server.
- HTTP Client - Reads data from a streaming HTTP resource URL.
- HTTP Server - Listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests.
- MQTT Subscriber - Subscribes to a topic on an MQTT broker to read messages from the broker.
- System Metrics - Reads system metrics from the edge device where SDC Edge is installed.
- WebSocket Client - Reads data from a WebSocket server endpoint.
- Windows Event Log - Reads data from a Microsoft Windows event log located on a Windows machine.
Development Origins
To help create or test pipelines, you can use the following development origins:
- Dev Data Generator
- Dev Random Source
- Dev Raw Data Source
- Dev SDC RPC with Buffering
- Dev Snapshot Replaying
- Sensor Reader
For more information, see Development Stages.