Building Reliable Data Pipelines with dbt and Airflow

Orchestration, testing and incremental models for production warehouses

By Houssam Kodad

PDF 284 pages Intermediate English

One-time purchase

€29.95

VAT included
where applicable

Download sample

Instant download after purchase
Readable on any device
Free updates to this edition
Secure checkout

About this book

What's inside

dbt and Airflow have become the backbone of the modern analytics platform, but wiring them together into something dependable is where most teams struggle. This book walks through a complete, production-grade pipeline: modelling data in dbt, scheduling it with Airflow, and making the whole thing testable, observable and safe to change. You'll learn the patterns that keep a warehouse trustworthy as models, sources and contributors multiply.

What you'll learn

Skills you'll walk away with

Structure a dbt project that scales past a few hundred models
Write incremental models that handle late-arriving and updated rows
Orchestrate dbt runs from Airflow with sensible retries and SLAs
Build a layered architecture: staging, intermediate and marts
Add data tests, freshness checks and contracts that fail loudly
Manage environments, CI and safe blue/green deployments
Diagnose and fix slow models and runaway warehouse spend

Table of contents

10 chapters

01
The Warehouse-Centric Pipeline
- · Why ELT replaced ETL
- · Where dbt and Airflow each fit
- · A reference architecture for the book
02
Structuring a dbt Project That Scales
- · Staging, intermediate and mart layers
- · Naming conventions and folder layout
- · Sources, seeds and the DAG
03
Incremental Models and the Late-Arriving Data Problem
- · Choosing an incremental strategy
- · Handling updates and backfills
- · Watermarks and lookback windows
04
Testing Data Before It Reaches Analysts
- · Generic and singular tests
- · Source freshness and volume tests
- · Custom tests with macros
05
Data Contracts and Model Governance
- · Enforcing column types and constraints
- · Versioning models and exposures
- · Owning the public interface of a mart
06
Orchestrating dbt with Airflow
- · Tasks, DAGs and dependencies
- · Running dbt selectively with state
- · Retries, SLAs and alerting
07
Environments, CI and Safe Deployments
- · Dev, staging and prod targets
- · Slim CI on pull requests
- · Blue/green and zero-downtime swaps
08
Performance Tuning and Cost Control
- · Reading the query plan
- · Partitioning, clustering and materializations
- · Finding the models that burn budget
09
Observability and Lineage
- · Run artifacts and metadata
- · Column-level lineage
- · Surfacing failures to the team
10
Operating the Platform in Production
- · On-call runbooks for pipeline failures
- · Backfilling without breaking downstream
- · A maturity checklist