Deciding between adding context in prompt vs relying on rag/tools

Some of the recent models like 4.1 and 5 have such large context windows that it seems like using rag or tools to inject additional context is not even needed for certain types of applications.

For example, if Im looking to build an assistant that an accountant or lawyer can chat with in regards to their client. Im imaging that this context would contain things like message history with the client, past invoices, potential notes they’ve taken etc. While the list of these things can quite long they are generally relatively small objects and would definitely fit in the context window. Would it not make sense to include all the data in the prompt for an assistant that is going to do things like

  • what did this client say about “x”
  • does this client have any open actions?

Im curious to hear if this approach is sound and if not why not, what are the downsides to just use the context window for this type of application. If you’ve tried this, would be great to hear any learnings.

In our application, we use file_search and function tools to fetch information needed to process a given request. We were also tempted by stuffing the context. One issue is cost – as putting ALL of the context information in there is often overkill – although input tokens are relatively inexpensive. The real issue, IMO, is confusion. As instructions get longer and longer, it’s more likely that the LLM will start to get confused. And that’s especially true in the faster models like gpt-5-mini/nano. So we find it preferable to use a tool to fetch the specific information needed. We depend on the model to do decent reasoning to know what to ask for.

1 Like

My personal rule of thumb for this:

If the model answer needs to look through the context as a whole (e.g. analyse the patterns, reorganize elements, learn style, summarize etc.) - stuff the prompt as much as it makes sense… On the other hand if the task is about “focus” to make it good - keep the context as lean as it needs.

In both cases RAG is used (otherwise where do I get the context from) if not provided manually.

So to answer your question: it depends on 3 things:

  1. The task
  2. What is needed to accomplish the task
  3. How you designed your application to accomplish the task.
2 Likes