Streaming Data Engineering with Kafka and Flink

Real-time pipelines, exactly-once processing and stateful streams

By Houssam Kodad

PDF 336 pages Advanced English

One-time purchase

€34.95

VAT included
where applicable

Download sample

Instant download after purchase
Readable on any device
Free updates to this edition
Secure checkout

About this book

What's inside

Batch pipelines answer yesterday's questions; streaming answers them now. This book takes you deep into Apache Kafka and Apache Flink to build real-time systems that are correct under failure, not just fast on a good day. You'll work through event-time semantics, watermarks, stateful operators and exactly-once delivery — the genuinely hard parts that separate a demo from a system you can put on call.

What you'll learn

Skills you'll walk away with

Design event-driven pipelines around Kafka topics and partitions
Reason about event time, processing time and watermarks
Build stateful Flink jobs with keyed state and timers
Guarantee exactly-once processing across the pipeline
Handle late data, out-of-order events and reprocessing
Manage schema evolution with a registry and contracts
Size, tune and operate Flink clusters under real load
Recover cleanly from failures with checkpoints and savepoints

Table of contents

11 chapters

01
From Batch to Streams
- · The cost of waiting for a batch
- · Logs as the source of truth
- · When streaming is the wrong choice
02
Kafka as the Backbone
- · Topics, partitions and ordering
- · Producers, consumers and offsets
- · Retention, compaction and tiered storage
03
Modelling Events That Last
- · Event design and naming
- · Keys, partitioning and hot spots
- · Versioning events over time
04
Schema Evolution Without Breakage
- · The schema registry
- · Backward and forward compatibility
- · Contracts between producers and consumers
05
Flink Fundamentals for Engineers
- · The dataflow model
- · Sources, operators and sinks
- · The job graph and parallelism
06
Event Time and the Watermark Problem
- · Event vs processing time
- · Generating and propagating watermarks
- · Allowed lateness and side outputs
07
Stateful Stream Processing
- · Keyed state and state backends
- · Timers and windows
- · Managing state size over time
08
Exactly-Once, End to End
- · Checkpoints and barriers
- · Transactional sinks
- · Idempotency where transactions stop
09
Joins, Enrichment and CDC
- · Stream-to-stream joins
- · Lookups against external state
- · Change-data-capture into streams
10
Operating Flink in Production
- · Savepoints and upgrades
- · Backpressure and tuning
- · Reprocessing history safely
11
A Real-Time Reference Pipeline
- · End-to-end architecture
- · Failure scenarios and recovery
- · Monitoring and alerting