KM-201c · Module 1

Chunking and Embedding Strategy: How Structure Determines Retrieval Quality

5 min read

Chunking is the process of dividing knowledge base documents into the segments that will be indexed and retrieved. It is the most commonly underestimated technical decision in RAG implementation, and the one with the highest impact on retrieval quality. The retrieval system does not retrieve documents — it retrieves chunks. If the chunks are too large, the retrieved context contains a lot of irrelevant material that dilutes the answer. If the chunks are too small, the retrieved context lacks the context needed to synthesize a complete answer. If chunks cut across concept boundaries, the retrieved material is internally incoherent.

The wrong default is fixed-size chunking: divide every document into chunks of exactly N tokens regardless of document structure. This is the simplest implementation and reliably produces poor retrieval quality. A 500-token chunk cut in the middle of a multi-step procedure retrieves half a procedure and half of the next section. A 500-token chunk that spans a section boundary mixes two unrelated concepts in the same vector embedding, making it a confusing retrieval result for either topic.

Structure-Aware Chunking Use the document's own structure to define chunk boundaries. Headers define section boundaries — each section becomes a chunk. Numbered steps define step boundaries — each step (with its parent section for context) becomes a chunk. Paragraphs define paragraph boundaries for prose documents. Structure-aware chunking requires parsing the document format but produces chunks that are internally coherent because they respect the author's own organizational structure.
Semantic Chunking For documents without clear structural markers, use a semantic segmentation algorithm that identifies topic shifts in the text and creates chunk boundaries at topic transitions. This is more computationally expensive than structure-aware chunking but produces better results on unstructured documents. The current best implementations use embedding similarity between consecutive sentences — a drop in similarity signals a topic transition.
Chunk Overlap Strategy Add a small overlap between adjacent chunks (10–15% of chunk size). If chunk N ends at sentence X, chunk N+1 begins at sentence X-2. This handles the case where a query is most relevant to content that spans a chunk boundary — the overlap ensures that both chunks contain the bridging content and either can be retrieved as a relevant result.
Metadata Inheritance Every chunk inherits the metadata of its parent document: title, owner, category, review date, audience, document type. This is not optional — it is required for filtered retrieval. A chunk without metadata can only be retrieved by content similarity. A chunk with metadata can be retrieved by similarity within a filtered set (only chunks from this category, only chunks from documents reviewed in the past 90 days). Metadata filtering dramatically improves retrieval precision.
Embedding Model Selection The embedding model determines how concepts are represented in vector space. Domain-specific embedding models outperform general-purpose models on domain-specific retrieval tasks. For enterprise knowledge bases with significant technical or domain-specific vocabulary, evaluate domain-specific embedding models against the general-purpose options on a representative test set before committing to a production index.

The index architecture determines what retrieval patterns are possible. A flat vector index supports similarity search but not structured filtering. A hybrid index supports both similarity search and metadata filtering — the standard requirement for enterprise knowledge retrieval. A graph-augmented index supports relationship traversal — find all documents related to this decision record, including the procedures that implement the decision and the policies that the decision complied with.

Index maintenance is a continuous process, not a one-time task. Every time a document is updated, the chunks derived from that document must be re-embedded and the old chunks must be removed from the index. Every time a document is deprecated, its chunks must be removed or flagged as deprecated in the index. An index containing stale chunks from deprecated documents will surface outdated information in retrieval results — the classic garbage-in, garbage-out failure mode in RAG systems.