RAG Engineering Mastery4 / 10
Hybrid Retrieval — Keyword + Vector
Vector search understands meaning but fumbles exact terms, IDs, and rare words. Keyword search nails those and misses paraphrase. Use both.

Vector search is great at "what does this mean" and bad at "find the chunk that literally says ERR_CONN_4032." Keyword search is the opposite. Production RAG uses both.
Where each one wins
- Vector — paraphrase, concepts, "how do I cancel" matching "subscription termination."
- Keyword (BM25) — exact terms, error codes, product names, acronyms, rare jargon the embedding smooths over.
Run both for every query; you get two ranked lists.
Fusing the lists with RRF
Reciprocal Rank Fusion combines ranked lists without needing comparable scores: each document gets 1 / (k + rank) from each list, summed. Documents that rank well in either list rise; documents strong in both dominate.
score(doc) = Σ 1 / (k + rank_in_list_i) # k ≈ 60
It is a few lines of code, needs no score calibration, and reliably beats either retriever alone.
Series — RAG Engineering Mastery
- Part 01Why Naive RAG Fails in ProductionThe 50-line vector-search demo that wows in a notebook falls apart the moment real users ask real questions. Here is why — and the map out.
- Part 02Chunking — The Decision That Sets Your CeilingYou can't retrieve what you chunked badly. Chunking is the most under-rated lever in RAG — and the cheapest to get right.
- Part 03Embeddings & Vector Stores 101An embedding turns meaning into geometry. A vector store makes that geometry searchable in milliseconds. Get both right and retrieval gets easy.
- Part 04Hybrid Retrieval — Keyword + Vector — you are hereVector search understands meaning but fumbles exact terms, IDs, and rare words. Keyword search nails those and misses paraphrase. Use both.
- Part 05Re-Ranking — The Cheap Quality WinRetrieval gets you 30 plausible chunks. A re-ranker reads them against the actual question and floats the truly relevant few to the top.
- Part 06Prompting the Generator — Grounding & CitationsGreat retrieval is wasted if the model ignores it or can't point to its sources. Grounding is a prompt-design discipline, not an afterthought.
- Part 07Evaluation — You Can't Improve What You Don't MeasureWithout an eval set, every RAG change is a vibe. With one, you tune chunking, retrieval and prompts with a number that tells you if you helped or hurt.
- Part 08Handling Hallucinations & GuardrailsWhen retrieval comes up empty, a helpful model invents. Guardrails turn 'confidently wrong' into 'honestly unsure' — the difference users actually trust.
- Part 09Cost & Latency DisciplineA RAG query touches embeddings, a vector DB, a re-ranker and an LLM. Each adds milliseconds and cents. At scale, discipline here is the difference between a margin and a bonfire.
- Part 10The Production RAG Reference ArchitectureEvery piece, assembled: ingestion, hybrid retrieval, re-ranking, grounded generation, guardrails, eval and caching — the blueprint you can ship.