Cover of Streaming Data Engineering with Kafka and Flink
DRM-free · Yours to keep forever
Data Engineering

Streaming Data Engineering with Kafka and Flink

Real-time pipelines, exactly-once processing and stateful streams

By Houssam Kodad

PDF 336 pages Advanced English

One-time purchase

€34.95

VAT included
where applicable

Download sample
  • Instant download after purchase
  • Readable on any device
  • Free updates to this edition
  • Secure checkout

About this book

What's inside

Batch pipelines answer yesterday's questions; streaming answers them now. This book takes you deep into Apache Kafka and Apache Flink to build real-time systems that are correct under failure, not just fast on a good day. You'll work through event-time semantics, watermarks, stateful operators and exactly-once delivery — the genuinely hard parts that separate a demo from a system you can put on call.

What you'll learn

Skills you'll walk away with

  • Design event-driven pipelines around Kafka topics and partitions
  • Reason about event time, processing time and watermarks
  • Build stateful Flink jobs with keyed state and timers
  • Guarantee exactly-once processing across the pipeline
  • Handle late data, out-of-order events and reprocessing
  • Manage schema evolution with a registry and contracts
  • Size, tune and operate Flink clusters under real load
  • Recover cleanly from failures with checkpoints and savepoints

Table of contents

11 chapters
  1. 01

    From Batch to Streams

    • · The cost of waiting for a batch
    • · Logs as the source of truth
    • · When streaming is the wrong choice
  2. 02

    Kafka as the Backbone

    • · Topics, partitions and ordering
    • · Producers, consumers and offsets
    • · Retention, compaction and tiered storage
  3. 03

    Modelling Events That Last

    • · Event design and naming
    • · Keys, partitioning and hot spots
    • · Versioning events over time
  4. 04

    Schema Evolution Without Breakage

    • · The schema registry
    • · Backward and forward compatibility
    • · Contracts between producers and consumers
  5. 05

    Flink Fundamentals for Engineers

    • · The dataflow model
    • · Sources, operators and sinks
    • · The job graph and parallelism
  6. 06

    Event Time and the Watermark Problem

    • · Event vs processing time
    • · Generating and propagating watermarks
    • · Allowed lateness and side outputs
  7. 07

    Stateful Stream Processing

    • · Keyed state and state backends
    • · Timers and windows
    • · Managing state size over time
  8. 08

    Exactly-Once, End to End

    • · Checkpoints and barriers
    • · Transactional sinks
    • · Idempotency where transactions stop
  9. 09

    Joins, Enrichment and CDC

    • · Stream-to-stream joins
    • · Lookups against external state
    • · Change-data-capture into streams
  10. 10

    Operating Flink in Production

    • · Savepoints and upgrades
    • · Backpressure and tuning
    • · Reprocessing history safely
  11. 11

    A Real-Time Reference Pipeline

    • · End-to-end architecture
    • · Failure scenarios and recovery
    • · Monitoring and alerting

This is the full chapter list — exactly what you'll receive in the PDF.