RAG Engineering Mastery2 / 10

Chunking — The Decision That Sets Your Ceiling

You can't retrieve what you chunked badly. Chunking is the most under-rated lever in RAG — and the cheapest to get right.

Published May 5, 20261 min readHaythem Rehouma · Claude Mastery

Retrieval can only return the chunks you created. If a chunk splits an idea in half, no embedding model on earth will retrieve it whole. Chunking sets the ceiling on everything downstream.

Three strategies

Fixed-size — split every N tokens with overlap. Simple, fast, dumb. Fine for uniform prose, bad for structured docs.
Structural — split on the document's own boundaries: headings, sections, list items, code blocks. Respects meaning for free.
Semantic — split where the topic shifts (embedding-distance based). Best quality, higher cost.

Start structural; it captures most of the win at near-zero cost.

Size and overlap

Too small and a chunk loses context; too big and retrieval gets noisy and the prompt gets expensive. A pragmatic default: 300–600 tokens with ~15% overlap, then tune against your eval set (article 7).

Metadata is the quiet superpower

Attach metadata to every chunk: source, title, section, date, URL. It powers filtered retrieval (only this product, only docs after this date) and lets the generator cite precisely.

Next: turning these chunks into vectors, and where to store them.

Three strategies

Size and overlap

Metadata is the quiet superpower

Related Claude skills you can install

Share this article

Series — RAG Engineering Mastery

Keep learning

The Claude Mastery course