RAG Architecture: 7 Patterns for Quality Retrieval

Executive summary

Default architecture: hybrid retrieval (keyword + vector) → reranker → context budget + citations → answer + self-check.
Most common failure: no eval harness, so “quality” is subjective and regressions slip into production.
Most common scaling issue: metadata is missing (tenant, doc type, ACL), so filtering is inaccurate and retrieval becomes unsafe.

Pattern 1 — Hybrid retrieval (BM25 + vectors)

Vectors are great for semantic similarity, but they can miss exact strings, codes, version numbers, and product identifiers. Hybrid retrieval combines strengths:

Keyword / BM25: exact matches (SKUs, error codes, product names, policy IDs).
Vector search: semantic similarity for “how do I…?” questions and paraphrases.
Filters first: tenant, ACL, doc type, region, and “effective date” should narrow the search space before scoring.

Pattern 2 — Chunking that matches how humans read

Chunking is not a technical detail: it is your retrieval granularity. The default that works across enterprise docs:

Semantic chunks: split by headings/sections and keep paragraphs together.
Small overlap: avoid losing definitions spanning two paragraphs.
Structure-aware: tables, SOPs, and runbooks benefit from specialized chunking.

Doc type	Chunk strategy	Notes
Policies / Legal	Section-based	Preserve clause boundaries; citations matter.
Runbooks / Ops	Step-based	Prefer “procedure blocks” over paragraphs.
Tickets / KB	Thread-based	Keep resolution + context together.

Pattern 3 — Query rewriting and expansion

Users rarely write the best retrieval query. A lightweight “query rewrite” step improves recall without changing the UI:

Normalize: expand abbreviations, map synonyms, keep the original.
Extract entities: product names, regions, dates, ticket IDs.
Generate 2–3 variants: one keyword-heavy, one semantic, one “problem→solution”.

Pattern 4 — Reranking (cheap, high impact)

Retrievers optimize speed; rerankers optimize relevance. In most stacks, reranking is the single best quality lever after filters.

Use reranking when: you have long documents, similar sections, or high “near-duplicate” content.
Keep it bounded: rerank top 20–50 results, not thousands.
Measure: run eval before/after; keep a rollback path.

Pattern 5 — Context budgeting + citations

Long context windows don’t solve retrieval. They hide errors. Budget context explicitly:

Top-k with diversity: avoid 5 chunks from the same section when 3 topics are needed.
Citations: tie claims to sources; in regulated environments, this is non-negotiable.
Refuse gracefully: if evidence is missing, respond with “I can’t find it” and ask clarifying questions.

Pattern 6 — Eval harness (offline + online)

Without evaluation, “quality” is a feeling. With evaluation, it becomes an SLO.

Offline: golden set questions + expected citations; regression tests on every change.
Online: sample production traffic; track groundedness proxies and user feedback.
Failure taxonomy: retrieval miss, stale doc, wrong filter, hallucination, tool failure.

Pattern 7 — Guardrails for RAG (prompt injection & data safety)

RAG increases attack surface: documents can contain malicious instructions. Treat retrieval as untrusted input.

Instruction hierarchy: system > developer > user; retrieved text is evidence, not instructions.
Policy filters: block unsafe tools/actions and sensitive data exfiltration.
Audit trail: log query, retrieved doc IDs, and citations (privacy-safe).

Decision matrix

Problem	Most likely fix	What to measure
Wrong answers with “confident” tone	Reranking + citations + refusal policy	Hallucination rate, groundedness
Answers ignore the newest policy	Metadata (effective date) + filtering	Staleness, doc coverage
Misses exact identifiers / codes	Hybrid retrieval	Recall on code-heavy queries
Too slow at scale	Index sizing, batching, caching	p95 latency, throughput

RAG Architecture: 7 Patterns for Quality Retrieval

Executive summary

Pattern 1 — Hybrid retrieval (BM25 + vectors)

Pattern 2 — Chunking that matches how humans read

Pattern 3 — Query rewriting and expansion

Pattern 4 — Reranking (cheap, high impact)

Pattern 5 — Context budgeting + citations

Pattern 6 — Eval harness (offline + online)

Pattern 7 — Guardrails for RAG (prompt injection & data safety)

Decision matrix

Related Articles

RAG Architecture: 7 Patterns for Quality Retrieval

Want the full technical deep dive?

Key takeaways

30-day plan

Executive summary

Pattern 1 — Hybrid retrieval (BM25 + vectors)

Pattern 2 — Chunking that matches how humans read

Pattern 3 — Query rewriting and expansion

Pattern 4 — Reranking (cheap, high impact)

Pattern 5 — Context budgeting + citations

Pattern 6 — Eval harness (offline + online)

Pattern 7 — Guardrails for RAG (prompt injection & data safety)

Decision matrix

Related Articles

Vector Databases for RAG

LLM Observability Stack

Key takeaways

30-day plan