Building Reliable Data Pipelines with dbt and Airflow
Orchestration, testing and incremental models for production warehouses
By Houssam Kodad
One-time purchase
€29.95
VAT included
where applicable
- Instant download after purchase
- Readable on any device
- Free updates to this edition
- Secure checkout
About this book
What's inside
dbt and Airflow have become the backbone of the modern analytics platform, but wiring them together into something dependable is where most teams struggle. This book walks through a complete, production-grade pipeline: modelling data in dbt, scheduling it with Airflow, and making the whole thing testable, observable and safe to change. You'll learn the patterns that keep a warehouse trustworthy as models, sources and contributors multiply.
What you'll learn
Skills you'll walk away with
- Structure a dbt project that scales past a few hundred models
- Write incremental models that handle late-arriving and updated rows
- Orchestrate dbt runs from Airflow with sensible retries and SLAs
- Build a layered architecture: staging, intermediate and marts
- Add data tests, freshness checks and contracts that fail loudly
- Manage environments, CI and safe blue/green deployments
- Diagnose and fix slow models and runaway warehouse spend
Table of contents
10 chapters-
01
The Warehouse-Centric Pipeline
- · Why ELT replaced ETL
- · Where dbt and Airflow each fit
- · A reference architecture for the book
-
02
Structuring a dbt Project That Scales
- · Staging, intermediate and mart layers
- · Naming conventions and folder layout
- · Sources, seeds and the DAG
-
03
Incremental Models and the Late-Arriving Data Problem
- · Choosing an incremental strategy
- · Handling updates and backfills
- · Watermarks and lookback windows
-
04
Testing Data Before It Reaches Analysts
- · Generic and singular tests
- · Source freshness and volume tests
- · Custom tests with macros
-
05
Data Contracts and Model Governance
- · Enforcing column types and constraints
- · Versioning models and exposures
- · Owning the public interface of a mart
-
06
Orchestrating dbt with Airflow
- · Tasks, DAGs and dependencies
- · Running dbt selectively with state
- · Retries, SLAs and alerting
-
07
Environments, CI and Safe Deployments
- · Dev, staging and prod targets
- · Slim CI on pull requests
- · Blue/green and zero-downtime swaps
-
08
Performance Tuning and Cost Control
- · Reading the query plan
- · Partitioning, clustering and materializations
- · Finding the models that burn budget
-
09
Observability and Lineage
- · Run artifacts and metadata
- · Column-level lineage
- · Surfacing failures to the team
-
10
Operating the Platform in Production
- · On-call runbooks for pipeline failures
- · Backfilling without breaking downstream
- · A maturity checklist
This is the full chapter list — exactly what you'll receive in the PDF.
More in Data Engineering
Keep exploring this track
Streaming Data Engineering with Kafka and Flink
Real-time pipelines, exactly-once processing and stateful streams
Data Modeling for Analytics
Dimensional design, slowly changing dimensions and the one-big-table debate
Spark Performance Tuning: A Field Guide
Diagnosing shuffles, skew and memory pressure in production
Data Quality and Observability
Contracts, tests and lineage for pipelines you can trust