Journal/RAG/When not to use RAG (and what to do instead).

When not to use RAG (and what to do instead).

Retrieval is the right answer to a smaller set of questions than its popularity suggests. Here are the four times we've stopped reaching for it.

Published
Nov 14, 2025
Reading time
6 minutes
Category
RAG

RAG has become the default reach in any "we want AI to know our stuff" conversation. It is also, in our experience, the wrong reach in roughly half the cases we are called into. The fix is rarely a better embedding. It is reaching for a different tool.

This is a short note on the four times we have stopped using RAG, and what we use instead.

01. When the answer is a number, not a paragraph

If the question the agent needs to answer is fundamentally a SQL query — how many of these did we ship last month, and what was the average margin? — RAG is the wrong shape. You are not retrieving prose; you are aggregating structured data. Putting a vector store in front of this is, almost always, a long road to a worse version of a SELECT statement.

The right tool is text-to-SQL, against the data warehouse you already have. Modern frontier models do this well enough that the surrounding work is the schema description, the access controls, and the eval set. The retrieval is happening — but it is happening against tables, not chunks.

02. When the corpus is small and stable

If the entire knowledge base fits comfortably in the model's context window — say, a hundred-page operating policy that does not change weekly — the answer is to not retrieve at all. Put the document in the prompt. Cache it. Done.

This sounds too simple to be a real recommendation. It is the recommendation. Vector stores have a fixed cost — operationally, in inference, in the engineering you have to do to keep them fresh — that you do not pay if you can simply hand the model the whole document. We have replaced more than one production RAG system with three lines of prompt-caching code.

Half of "let's add RAG" projects should have been "let's just put the document in the context window and cache it." — working note, internal review

03. When the question requires reasoning, not lookup

If the agent's job is to think about something — to compare, to plan, to compose — adding retrieval often makes the answer worse. The model spends part of its budget summarising the chunks instead of doing the reasoning, and the chunks themselves are often beside the point.

The right move is to keep the retrieval scope tight — only the documents that really anchor the question — and to leave the model room to reason. Sometimes the right answer is to retrieve nothing and rely on the model's own knowledge, especially for questions about widely understood concepts. RAG is for grounding. Reasoning needs space, not more grounding.

04. When the data changes faster than the index

Retrieval works when the index is, broadly, current. When the underlying data changes faster than your indexing pipeline can keep up — pricing, inventory, real-time customer state — RAG produces confidently stale answers. The model has no way to know the chunk it just retrieved is two hours old.

The right move here is function calling: give the model a tool that queries the source of truth at request time. The agent asks the API, gets the current state, and answers. The boundary moves from "retrieve, then generate" to "look up, then generate," and the answers are correct in a way that an index could never make them.

When RAG is right

To be balanced: RAG is the correct answer when the corpus is large, the data is mostly textual, the questions are open-ended, and the answer is expected to cite or quote source material. Knowledge assistants over a documentation corpus, support agents over a help center, internal Q&A across years of policies — these are the canonical fits, and they are real.

The mistake is not using RAG. The mistake is reaching for it before asking which of the four conditions above might apply.

Decision rule: before adding a vector store, answer four questions. Is the answer a number? Does the corpus fit in context? Does the task require reasoning? Does the data change faster than the index? Any "yes" should make you reach for a different tool first.

A short closing

RAG is a useful technique. It is not the answer to "we want AI to know our stuff." It is the answer to one specific shape of that question — and there are at least three other shapes, each with a better-fit tool. Pick deliberately. The agent that results will be cheaper to run, faster to answer, and easier to defend when the question gets harder.


Filed under: RAG · METHOD
First published: Nov 14, 2025