Solutions Overview
This chapter includes the following solutions that describe how to design pipelines for
common use cases:
- Converting Data to the Parquet Data Format
- Automating Impala Metadata Updates for Drift Synchronization for Hive
- Managing Output Files
- Stopping a Pipeline After Processing All Available Data
- Offloading Data from Relational Sources to Hadoop
- Sending Email During Pipeline Processing
- Preserving an Audit Trail of Events
- Loading Data into Databricks Delta Lake
- Drift Synchronization Solution for Hive
- Drift Synchronization Solution for PostgreSQL
Tip: For additional solutions, you can also review the sample pipelines included in the Data Collector pipeline library.
Download the sample pipelines and then import them into Data Collector. Review the
sample pipelines or use them as a starting point for your own pipeline development.