Kinesis Firehose
The Kinesis Firehose destination writes data to an Amazon Kinesis Firehose delivery stream. Firehose automatically delivers the data to the Amazon S3 bucket or Amazon Redshift table that you specify in the delivery stream. For information about supported versions, see Supported Systems and Versions in the Data Collector documentation.
To write data to Amazon Kinesis Streams, use the Kinesis Producer destination. To write data directly to Amazon S3, use the Amazon S3 destination.
When you use the Kinesis Firehose destination to deliver data to Amazon S3, Firehose can buffer incoming records into larger file sizes before delivering the data to Amazon S3. You configure the buffer size and buffer interval when you create the delivery stream.
When you configure the Kinesis Firehose destination, you specify an existing delivery stream to write to, AWS credentials and region, and the data format to use.
You can also use a connection to configure the destination.
Authentication Method
You can configure the Kinesis Firehose destination to authenticate with Amazon Web Services (AWS) using an instance profile or AWS access keys.
For more information about the authentication methods and details on how to configure each method, see Security in Amazon Stages.
Delivery Stream
The Kinesis Firehose destination writes data to an existing delivery stream in Amazon Kinesis Firehose. Before using the Kinesis Firehose destination, use the AWS Management Console to create a delivery stream to an Amazon S3 bucket or Amazon Redshift table.
For more information about creating a Firehose delivery stream, see the Amazon Kinesis Firehose documentation.
Data Formats
The Kinesis Firehose destination writes data to a Kinesis Firehose delivery stream based on the data format that you select.
The Kinesis Firehose destination processes data formats as follows:
- Delimited
- The destination writes records as delimited data. When you use this data format, the root field must be list or list-map.
- JSON
- The destination writes records as JSON data. Use the multiple objects format,
where each file includes multiple JSON objects. Each object is a JSON
representation of a record.Note: The JSON array of objects format is not supported for the Kinesis Firehose destination.
Configuring a Kinesis Firehose Destination
Configure a Kinesis Firehose destination to write data to an Amazon Kinesis Firehose delivery stream.
-
In the Properties panel, on the General tab, configure the
following properties:
General Property Description Name Stage name. Description Optional description. Required Fields Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses.Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error Error record handling for the stage: - Discard - Discards the record.
- Send to Error - Sends the record to the pipeline for error handling.
- Stop Pipeline - Stops the pipeline.
-
On the Kinesis tab, configure the following
properties:
Kinesis Property Description Connection Connection that defines the information required to connect to an external system. To connect to an external system, you can select a connection that contains the details, or you can directly enter the details in the pipeline. When you select a connection, Control Hub hides other properties so that you cannot directly enter connection details in the pipeline.
Authentication Method Authentication method used to connect to Amazon Web Services (AWS): - AWS Keys - Authenticates using an AWS access key pair.
- Instance Profile - Authenticates using an instance profile associated with the Data Collector EC2 instance.
Access Key ID AWS access key ID. Required when using AWS keys to authenticate with AWS. Secret Access Key AWS secret access key. Required when using AWS keys to authenticate with AWS. Tip: To secure sensitive information such as access key pairs, you can use runtime resources or credential stores. For more information about credential stores, see Credential Stores in the Data Collector documentation.Assume Role Temporarily assumes another role to authenticate with AWS. Role ARN Amazon resource name (ARN) of the role to assume, entered in the following format:
arn:aws:iam::<account_id>:role/<role_name>
Where
<account_id>
is the ID of your AWS account and<role_name>
is the name of the role to assume. You must create and attach an IAM trust policy to this role that allows the role to be assumed.Available when assuming another role.
Role Session Name Optional name for the session created by assuming a role. Overrides the default unique identifier for the session.
Available when assuming another role.
Session Timeout Maximum number of seconds for each session created by assuming a role. The session is refreshed if the pipeline continues to run for longer than this amount of time.
Set to a value between 3,600 seconds and 43,200 seconds.
Available when assuming another role.
Set Session Tags Sets a session tag to record the name of the currently logged in StreamSets user that starts the pipeline or the job for the pipeline. AWS IAM verifies that the user account set in the session tag can assume the specified role.
Select only when the IAM trust policy attached to the role to be assumed uses session tags and restricts the session tag values to specific user accounts.
When cleared, the connection does not set a session tag.
Available when assuming another role.
Region AWS region to connect to. Select one of the available regions. To specify an endpoint to connect to, select Other. Endpoint Endpoint to connect to when you select Other for the region. Enter the endpoint name. Stream Name Existing delivery stream to write to. Use the AWS Management Console to create the delivery stream to an Amazon S3 bucket or Amazon Redshift table.
Destination Type Type of Amazon destination to write to. Select Existing Stream. Maximum Record Size (KB) Maximum size of a single record. When records exceed this size, the destination handles the records based on the error record handling configured for the stage. Warning: A Firehose record can have a maximum size of 1,000 KB. If you configure a maximum size larger than 1,000 KB, Firehose does not accept any data written by the destination. -
On the Data Format tab, configure the following
property:
Data Format Property Description Data Format Data format to use. Use one of the following data formats: - Delimited
- JSON
-
For delimited data, on the Data Format tab, configure the
following properties:
Delimited Property Description Delimiter Format Format for delimited data: - Default CSV - File that includes comma-separated values. Ignores empty lines in the file.
- RFC4180 CSV - Comma-separated file that strictly follows RFC4180 guidelines.
- MS Excel CSV - Microsoft Excel comma-separated file.
- MySQL CSV - MySQL comma-separated file.
- Tab-Separated Values - File that includes tab-separated values.
- PostgreSQL CSV - PostgreSQL comma-separated file.
- PostgreSQL Text - PostgreSQL text file.
- Custom - File that uses user-defined delimiter, escape, and quote characters.
Header Line Indicates whether to create a header line. Delimiter Character Delimiter character for a custom delimiter format. Select one of the available options or use Other to enter a custom character. You can enter a Unicode control character using the format \uNNNN, where N is a hexadecimal digit from the numbers 0-9 or the letters A-F. For example, enter \u0000 to use the null character as the delimiter or \u2028 to use a line separator as the delimiter.
Default is the pipe character ( | ).
Record Separator String Characters to use to separate records. Use any valid Java string literal. For example, when writing to Windows, you might use \r\n to separate records. Available when using a custom delimiter format.
Escape Character Escape character for a custom delimiter format. Select one of the available options or use Other to enter a custom character. Default is the backslash character ( \ ).
Quote Character Quote character for a custom delimiter format. Select one of the available options or use Other to enter a custom character. Default is the quotation mark character ( " ).
Replace New Line Characters Replaces new line characters with the configured string. Recommended when writing data as a single line of text.
New Line Character Replacement String to replace each new line character. For example, enter a space to replace each new line character with a space. Leave empty to remove the new line characters.
Charset Character set to use when writing data. -
For JSON data, on the Data Format tab, configure the
following property:
JSON Property Description JSON Content Determines how JSON data is written. Select Multiple JSON Objects. Each file includes multiple JSON objects. Each object is a JSON representation of a record. Note: The JSON array of objects format is not supported for the Kinesis Firehose destination.Charset Character set to use when writing data.