Retrieval-Augmented Generation in Production
Chunking, embeddings, vector search and reranking that actually work
By Houssam Kodad
One-time purchase
€29.95
VAT included
where applicable
- Instant download after purchase
- Readable on any device
- Free updates to this edition
- Secure checkout
About this book
What's inside
Most RAG systems work in the demo and disappoint in production, because retrieval quality — not the model — is the bottleneck. This book goes deep on the retrieval pipeline: how to chunk documents, choose and combine embeddings, run hybrid search, and rerank so the right context reaches the model. You'll learn to evaluate retrieval rigorously and debug the failure modes that quietly tank answer quality.
What you'll learn
Skills you'll walk away with
- Design chunking strategies that preserve meaning
- Choose and evaluate embedding models
- Run hybrid keyword-plus-vector retrieval
- Rerank candidates for precision at the top
- Build evaluation for retrieval and end-to-end answers
- Handle metadata filtering and access control
- Keep the index fresh as source data changes
- Diagnose and fix common RAG failure modes
Table of contents
10 chapters-
01
Why RAG Breaks in Production
- · Retrieval as the real bottleneck
- · The end-to-end pipeline
- · Failure modes overview
-
02
Chunking Without Losing Meaning
- · Fixed, semantic and structural chunks
- · Overlap and chunk size
- · Document structure and tables
-
03
Embeddings in Depth
- · How embeddings encode meaning
- · Choosing a model
- · Domain adaptation and fine-tuning
-
04
Vector Stores and Indexes
- · ANN indexes explained
- · Recall vs latency trade-offs
- · Scaling and sharding
-
05
Hybrid and Filtered Retrieval
- · Combining BM25 and vectors
- · Metadata filtering
- · Access-controlled retrieval
-
06
Reranking for Precision
- · Cross-encoder rerankers
- · Fusion of multiple retrievers
- · Cost vs quality balance
-
07
Assembling Context for the Model
- · Context ordering and budgets
- · Citations and grounding
- · Deduplication and compression
-
08
Evaluating Retrieval and Answers
- · Retrieval metrics
- · Faithfulness and answer quality
- · Building a regression set
-
09
Keeping the Index Fresh
- · Incremental indexing
- · Re-embedding on model changes
- · Deletes and right-to-be-forgotten
-
10
Debugging a RAG System
- · Tracing a bad answer
- · Common root causes
- · A tuning workflow
This is the full chapter list — exactly what you'll receive in the PDF.
More in AI & LLMs
Keep exploring this track
Building LLM-Powered Applications
Architecture, evaluation and guardrails for production
Fine-Tuning and Adapting Open LLMs
LoRA, quantization and instruction tuning on your own data
Prompt Engineering for Developers
Reliable patterns, structured outputs and tool use