Cover of Building Reliable Data Pipelines with dbt and Airflow
DRM-free · Yours to keep forever
Data Engineering

Building Reliable Data Pipelines with dbt and Airflow

Orchestration, testing and incremental models for production warehouses

By Houssam Kodad

PDF 284 pages Intermediate English

One-time purchase

€29.95

VAT included
where applicable

Download sample
  • Instant download after purchase
  • Readable on any device
  • Free updates to this edition
  • Secure checkout

About this book

What's inside

dbt and Airflow have become the backbone of the modern analytics platform, but wiring them together into something dependable is where most teams struggle. This book walks through a complete, production-grade pipeline: modelling data in dbt, scheduling it with Airflow, and making the whole thing testable, observable and safe to change. You'll learn the patterns that keep a warehouse trustworthy as models, sources and contributors multiply.

What you'll learn

Skills you'll walk away with

  • Structure a dbt project that scales past a few hundred models
  • Write incremental models that handle late-arriving and updated rows
  • Orchestrate dbt runs from Airflow with sensible retries and SLAs
  • Build a layered architecture: staging, intermediate and marts
  • Add data tests, freshness checks and contracts that fail loudly
  • Manage environments, CI and safe blue/green deployments
  • Diagnose and fix slow models and runaway warehouse spend

Table of contents

10 chapters
  1. 01

    The Warehouse-Centric Pipeline

    • · Why ELT replaced ETL
    • · Where dbt and Airflow each fit
    • · A reference architecture for the book
  2. 02

    Structuring a dbt Project That Scales

    • · Staging, intermediate and mart layers
    • · Naming conventions and folder layout
    • · Sources, seeds and the DAG
  3. 03

    Incremental Models and the Late-Arriving Data Problem

    • · Choosing an incremental strategy
    • · Handling updates and backfills
    • · Watermarks and lookback windows
  4. 04

    Testing Data Before It Reaches Analysts

    • · Generic and singular tests
    • · Source freshness and volume tests
    • · Custom tests with macros
  5. 05

    Data Contracts and Model Governance

    • · Enforcing column types and constraints
    • · Versioning models and exposures
    • · Owning the public interface of a mart
  6. 06

    Orchestrating dbt with Airflow

    • · Tasks, DAGs and dependencies
    • · Running dbt selectively with state
    • · Retries, SLAs and alerting
  7. 07

    Environments, CI and Safe Deployments

    • · Dev, staging and prod targets
    • · Slim CI on pull requests
    • · Blue/green and zero-downtime swaps
  8. 08

    Performance Tuning and Cost Control

    • · Reading the query plan
    • · Partitioning, clustering and materializations
    • · Finding the models that burn budget
  9. 09

    Observability and Lineage

    • · Run artifacts and metadata
    • · Column-level lineage
    • · Surfacing failures to the team
  10. 10

    Operating the Platform in Production

    • · On-call runbooks for pipeline failures
    • · Backfilling without breaking downstream
    • · A maturity checklist

This is the full chapter list — exactly what you'll receive in the PDF.