Solutions Overview

This chapter includes the following solutions that describe how to design pipelines for common use cases:

Converting Data to the Parquet Data Format
Automating Impala Metadata Updates for Drift Synchronization for Hive
Managing Output Files
Stopping a Pipeline After Processing All Available Data
Offloading Data from Relational Sources to Hadoop
Sending Email During Pipeline Processing
Preserving an Audit Trail of Events
Loading Data into Databricks Delta Lake
Drift Synchronization Solution for Hive
Drift Synchronization Solution for PostgreSQL

Tip: For additional solutions, you can also review the sample pipelines included in the Data Collector pipeline library. Download the sample pipelines and then import them into Data Collector. Review the sample pipelines or use them as a starting point for your own pipeline development.