Cover of Retrieval-Augmented Generation in Production
DRM-free · Yours to keep forever
AI & LLMs

Retrieval-Augmented Generation in Production

Chunking, embeddings, vector search and reranking that actually work

By Houssam Kodad

PDF 244 pages Advanced English

One-time purchase

€29.95

VAT included
where applicable

Download sample
  • Instant download after purchase
  • Readable on any device
  • Free updates to this edition
  • Secure checkout

About this book

What's inside

Most RAG systems work in the demo and disappoint in production, because retrieval quality — not the model — is the bottleneck. This book goes deep on the retrieval pipeline: how to chunk documents, choose and combine embeddings, run hybrid search, and rerank so the right context reaches the model. You'll learn to evaluate retrieval rigorously and debug the failure modes that quietly tank answer quality.

What you'll learn

Skills you'll walk away with

  • Design chunking strategies that preserve meaning
  • Choose and evaluate embedding models
  • Run hybrid keyword-plus-vector retrieval
  • Rerank candidates for precision at the top
  • Build evaluation for retrieval and end-to-end answers
  • Handle metadata filtering and access control
  • Keep the index fresh as source data changes
  • Diagnose and fix common RAG failure modes

Table of contents

10 chapters
  1. 01

    Why RAG Breaks in Production

    • · Retrieval as the real bottleneck
    • · The end-to-end pipeline
    • · Failure modes overview
  2. 02

    Chunking Without Losing Meaning

    • · Fixed, semantic and structural chunks
    • · Overlap and chunk size
    • · Document structure and tables
  3. 03

    Embeddings in Depth

    • · How embeddings encode meaning
    • · Choosing a model
    • · Domain adaptation and fine-tuning
  4. 04

    Vector Stores and Indexes

    • · ANN indexes explained
    • · Recall vs latency trade-offs
    • · Scaling and sharding
  5. 05

    Hybrid and Filtered Retrieval

    • · Combining BM25 and vectors
    • · Metadata filtering
    • · Access-controlled retrieval
  6. 06

    Reranking for Precision

    • · Cross-encoder rerankers
    • · Fusion of multiple retrievers
    • · Cost vs quality balance
  7. 07

    Assembling Context for the Model

    • · Context ordering and budgets
    • · Citations and grounding
    • · Deduplication and compression
  8. 08

    Evaluating Retrieval and Answers

    • · Retrieval metrics
    • · Faithfulness and answer quality
    • · Building a regression set
  9. 09

    Keeping the Index Fresh

    • · Incremental indexing
    • · Re-embedding on model changes
    • · Deletes and right-to-be-forgotten
  10. 10

    Debugging a RAG System

    • · Tracing a bad answer
    • · Common root causes
    • · A tuning workflow

This is the full chapter list — exactly what you'll receive in the PDF.