Streaming Data Engineering with Kafka and Flink
Real-time pipelines, exactly-once processing and stateful streams
By Houssam Kodad
One-time purchase
€34.95
VAT included
where applicable
- Instant download after purchase
- Readable on any device
- Free updates to this edition
- Secure checkout
About this book
What's inside
Batch pipelines answer yesterday's questions; streaming answers them now. This book takes you deep into Apache Kafka and Apache Flink to build real-time systems that are correct under failure, not just fast on a good day. You'll work through event-time semantics, watermarks, stateful operators and exactly-once delivery — the genuinely hard parts that separate a demo from a system you can put on call.
What you'll learn
Skills you'll walk away with
- Design event-driven pipelines around Kafka topics and partitions
- Reason about event time, processing time and watermarks
- Build stateful Flink jobs with keyed state and timers
- Guarantee exactly-once processing across the pipeline
- Handle late data, out-of-order events and reprocessing
- Manage schema evolution with a registry and contracts
- Size, tune and operate Flink clusters under real load
- Recover cleanly from failures with checkpoints and savepoints
Table of contents
11 chapters-
01
From Batch to Streams
- · The cost of waiting for a batch
- · Logs as the source of truth
- · When streaming is the wrong choice
-
02
Kafka as the Backbone
- · Topics, partitions and ordering
- · Producers, consumers and offsets
- · Retention, compaction and tiered storage
-
03
Modelling Events That Last
- · Event design and naming
- · Keys, partitioning and hot spots
- · Versioning events over time
-
04
Schema Evolution Without Breakage
- · The schema registry
- · Backward and forward compatibility
- · Contracts between producers and consumers
-
05
Flink Fundamentals for Engineers
- · The dataflow model
- · Sources, operators and sinks
- · The job graph and parallelism
-
06
Event Time and the Watermark Problem
- · Event vs processing time
- · Generating and propagating watermarks
- · Allowed lateness and side outputs
-
07
Stateful Stream Processing
- · Keyed state and state backends
- · Timers and windows
- · Managing state size over time
-
08
Exactly-Once, End to End
- · Checkpoints and barriers
- · Transactional sinks
- · Idempotency where transactions stop
-
09
Joins, Enrichment and CDC
- · Stream-to-stream joins
- · Lookups against external state
- · Change-data-capture into streams
-
10
Operating Flink in Production
- · Savepoints and upgrades
- · Backpressure and tuning
- · Reprocessing history safely
-
11
A Real-Time Reference Pipeline
- · End-to-end architecture
- · Failure scenarios and recovery
- · Monitoring and alerting
This is the full chapter list — exactly what you'll receive in the PDF.
More in Data Engineering
Keep exploring this track
Building Reliable Data Pipelines with dbt and Airflow
Orchestration, testing and incremental models for production warehouses
Data Modeling for Analytics
Dimensional design, slowly changing dimensions and the one-big-table debate
Spark Performance Tuning: A Field Guide
Diagnosing shuffles, skew and memory pressure in production
Data Quality and Observability
Contracts, tests and lineage for pipelines you can trust