1 séries · 10 articles
Retrieval, prompting, évaluation, fine-tuning — la couche d'ingénierie qui transforme un modèle de chat en produit de production.
Every piece, assembled: ingestion, hybrid retrieval, re-ranking, grounded generation, guardrails, eval and caching — the blueprint you can ship.
A RAG query touches embeddings, a vector DB, a re-ranker and an LLM. Each adds milliseconds and cents. At scale, discipline here is the difference between a margin and a bonfire.
When retrieval comes up empty, a helpful model invents. Guardrails turn 'confidently wrong' into 'honestly unsure' — the difference users actually trust.
Without an eval set, every RAG change is a vibe. With one, you tune chunking, retrieval and prompts with a number that tells you if you helped or hurt.
Great retrieval is wasted if the model ignores it or can't point to its sources. Grounding is a prompt-design discipline, not an afterthought.
Retrieval gets you 30 plausible chunks. A re-ranker reads them against the actual question and floats the truly relevant few to the top.