Retrieval-Augmented Generation in Production

Chunking, embeddings, vector search and reranking that actually work

By Houssam Kodad

PDF 244 pages Advanced English

One-time purchase

€29.95

VAT included
where applicable

Download sample

Instant download after purchase
Readable on any device
Free updates to this edition
Secure checkout

About this book

What's inside

Most RAG systems work in the demo and disappoint in production, because retrieval quality — not the model — is the bottleneck. This book goes deep on the retrieval pipeline: how to chunk documents, choose and combine embeddings, run hybrid search, and rerank so the right context reaches the model. You'll learn to evaluate retrieval rigorously and debug the failure modes that quietly tank answer quality.

What you'll learn

Skills you'll walk away with

Design chunking strategies that preserve meaning
Choose and evaluate embedding models
Run hybrid keyword-plus-vector retrieval
Rerank candidates for precision at the top
Build evaluation for retrieval and end-to-end answers
Handle metadata filtering and access control
Keep the index fresh as source data changes
Diagnose and fix common RAG failure modes

Table of contents

10 chapters

01
Why RAG Breaks in Production
- · Retrieval as the real bottleneck
- · The end-to-end pipeline
- · Failure modes overview
02
Chunking Without Losing Meaning
- · Fixed, semantic and structural chunks
- · Overlap and chunk size
- · Document structure and tables
03
Embeddings in Depth
- · How embeddings encode meaning
- · Choosing a model
- · Domain adaptation and fine-tuning
04
Vector Stores and Indexes
- · ANN indexes explained
- · Recall vs latency trade-offs
- · Scaling and sharding
05
Hybrid and Filtered Retrieval
- · Combining BM25 and vectors
- · Metadata filtering
- · Access-controlled retrieval
06
Reranking for Precision
- · Cross-encoder rerankers
- · Fusion of multiple retrievers
- · Cost vs quality balance
07
Assembling Context for the Model
- · Context ordering and budgets
- · Citations and grounding
- · Deduplication and compression
08
Evaluating Retrieval and Answers
- · Retrieval metrics
- · Faithfulness and answer quality
- · Building a regression set
09
Keeping the Index Fresh
- · Incremental indexing
- · Re-embedding on model changes
- · Deletes and right-to-be-forgotten
10
Debugging a RAG System
- · Tracing a bad answer
- · Common root causes
- · A tuning workflow