RAG input via System message: JSON vs plain text

I was gonna get into this but deleted it from my post lol

Location of RAG context within system prompt - #2 by Diet

My concern (with my current understanding) would rather be that attention is unduly influenced by position - so I’d rather push all the knowledge into the red part so that everything has an equal chance of bubbling to the forefront - the model will generally still find the information if it appears relevant enough. however it probably won’t make all that much difference.

That seems pretty confusing at a glance, I’d try to structure it in a more readable way:

# Myproduct2 Video

Myproduct2 : This video tutorial describes how to ...

Link: [https://website.com/page2](Myproduct2 Video URL)
------

I can’t guarantee this will work with mini, but this is my general approach:

A picture of a monkey with an exposed brain illustration, accompanied by the caption "Neuron activation." (Captioned by AI)e

in long context problems (12k is long context imo) try to construct text blocks that generate a maximum specific activation in a certain area, and try to make them as semantically unsimilar as possible from other blocks. Then, you want to make sure that that particular block generates a maximal activation based on your immediate context (the tail of the generation)

That’s how Chain of Thought or “Think Step By Step” works:

if the issue is that a user asks “what’s the url for product 2?” and the model responds with any URL, it might be a good idea to get the model to write out a short summary of product 2:

(the model should be able to find the short description of product2)

Myproduct2 : is a video that describes how to …

that then increases the activation to the whole product 2 description block (or more importantly, decreases the attention directed towards unrelated blocks)

and once you have the attention on the entire block, you can retrieve the URL.

I’ll admit this is a bit contrived, but the idea is that instead of fetching ambiguous data, you take some time to amplify a concept before digging through your deeper context.

This can take some time to get right with a weak model. You generally always trade inference cost for development cost :confused:

Other stuff you can of course do:

  1. try to improve your data, to remove uninformative boilerplate and fluff
  2. try to pre-select stuff before giving it to the conversational LLM (present fewer options)
  3. use a better embedding model to return fewer, but more relevant results
1 Like