Solutions Overview
This chapter includes the following solutions that describe how to design pipelines for
common use cases:
- Converting Data to the Parquet Data Format
- Automating Impala Metadata Updates for Drift Synchronization for Hive
- Managing Output Files
- Stopping a Pipeline After Processing All Available Data
- Offloading Data from Relational Sources to Hadoop
- Sending Email During Pipeline Processing
- Preserving an Audit Trail of Events
- Loading Data into Databricks Delta Lake
- Drift Synchronization Solution for Hive
- Drift Synchronization Solution for PostgreSQL
Tip: For additional solutions, you can also review the sample pipelines included in the StreamSets Data Collector pipeline
library. Download the sample pipelines and then import them
into Data Collector.
Review the sample pipelines or use them as a starting point for your own pipeline development.