Hadoop FS

Supported pipeline types:
  • Data Collector

The Hadoop FS destination writes data to Hadoop Distributed File System (HDFS). You can also use the destination to write to Azure Blob storage. For information about supported versions, see Supported Systems and VersionsSupported Systems and Versions in the Data Collector documentation.

You can write data as flat files or Hadoop sequence files. You can also use the whole file data format to write whole files to HDFS.

When you configure a Hadoop FS destination, you can define a directory template and time basis to determine the output directories that the destination creates and the files where records are written.

As part of the Drift Synchronization Solution for Hive, you can alternatively use record header attributes to perform record-based writes. You can write records to the specified directory, use the defined Avro schema, and roll files based on record header attributes. For more information, see Record Header Attributes for Record-Based Writes.

You can define a file prefix and suffix, the data time zone, and properties that define when the destination closes a file. You can specify the amount of time that a record can be written to its associated directory and what happens to late records.

The destination can generate events for an event stream. For more information about the event framework, see Dataflow Triggers Overview.

When necessary, you can enable Kerberos authentication. You can also specify a Hadoop user to impersonate, define a Hadoop configuration file directory, and add Hadoop configuration properties as needed.

You can use Gzip, Bzip2, Snappy, LZ4, and other compression formats to write output files.