Topics
- batch vs streaming
- types of batch jobs
- orchestrating batch jobs
- advantages and disadvantages of batch jobs
Introduction
Batch and Streaming are 2 ways of processing data.
- Batch processing → processing chunks of data at regular intervals.
- Most data processing is done in batches (~80%).
- Stream processing → on the fly. Data is uploaded as it’s created.
Technologies
- python scripts
- SQL (dbt)
- Spark
- Flink…etc.
Orchestrating batch jobs (workflow)
Common workflow: