Data Quality and Observability
Contracts, tests and lineage for pipelines you can trust
By Houssam Kodad
One-time purchase
€22.95
VAT included
where applicable
- Instant download after purchase
- Readable on any device
- Free updates to this edition
- Secure checkout
About this book
What's inside
The fastest way to lose a stakeholder's trust is a dashboard that's quietly wrong. This concise book lays out a practical system for data quality: tests that catch issues before users do, contracts that stop bad data at the source, and observability that tells you when a pipeline silently breaks. It's a playbook for making 'is this number right?' a question you can answer with confidence.
What you'll learn
Skills you'll walk away with
- Define data quality dimensions that actually matter
- Write tests for freshness, volume, schema and distribution
- Set up data contracts between producers and consumers
- Detect anomalies and silent pipeline failures
- Track lineage to find blast radius fast
- Design alerting that signals without crying wolf
- Build a culture and SLA around trustworthy data
Table of contents
8 chapters-
01
What Trustworthy Data Means
- · The six dimensions of quality
- · Quality as a product feature
- · The cost of a wrong number
-
02
Tests That Catch Issues Early
- · Schema and not-null tests
- · Freshness and volume checks
- · Distribution and referential tests
-
03
Data Contracts at the Source
- · Producer responsibilities
- · Enforcing contracts in CI
- · Handling breaking changes
-
04
Anomaly Detection for Pipelines
- · Static thresholds vs learned baselines
- · Seasonality and drift
- · Reducing false positives
-
05
Lineage and Blast Radius
- · Table and column lineage
- · Tracing an incident upstream
- · Impact analysis before changes
-
06
Alerting Without Alert Fatigue
- · Severity and ownership
- · Routing and escalation
- · Tuning noisy checks
-
07
SLAs, SLOs and Data Reliability
- · Setting freshness SLAs
- · Measuring reliability
- · Reporting to stakeholders
-
08
Building a Quality Culture
- · Ownership and on-call
- · Post-incident reviews
- · A rollout roadmap
This is the full chapter list — exactly what you'll receive in the PDF.
More in Data Engineering
Keep exploring this track
Building Reliable Data Pipelines with dbt and Airflow
Orchestration, testing and incremental models for production warehouses
Streaming Data Engineering with Kafka and Flink
Real-time pipelines, exactly-once processing and stateful streams
Data Modeling for Analytics
Dimensional design, slowly changing dimensions and the one-big-table debate
Spark Performance Tuning: A Field Guide
Diagnosing shuffles, skew and memory pressure in production