RAG input via System message: JSON vs plain text

Diet · September 18, 2024, 10:39am

I was gonna get into this but deleted it from my post lol

Location of RAG context within system prompt - #2 by Diet

My concern (with my current understanding) would rather be that attention is unduly influenced by position - so I’d rather push all the knowledge into the red part so that everything has an equal chance of bubbling to the forefront - the model will generally still find the information if it appears relevant enough. however it probably won’t make all that much difference.

That seems pretty confusing at a glance, I’d try to structure it in a more readable way:

# Myproduct2 Video

Myproduct2 : This video tutorial describes how to ...

Link: [https://website.com/page2](Myproduct2 Video URL)
------

I can’t guarantee this will work with mini, but this is my general approach:

A picture of a monkey with an exposed brain illustration, accompanied by the caption "Neuron activation." (Captioned by AI) e

in long context problems (12k is long context imo) try to construct text blocks that generate a maximum specific activation in a certain area, and try to make them as semantically unsimilar as possible from other blocks. Then, you want to make sure that that particular block generates a maximal activation based on your immediate context (the tail of the generation)

That’s how Chain of Thought or “Think Step By Step” works:

if the issue is that a user asks “what’s the url for product 2?” and the model responds with any URL, it might be a good idea to get the model to write out a short summary of product 2:

(the model should be able to find the short description of product2)

Myproduct2 : is a video that describes how to …

that then increases the activation to the whole product 2 description block (or more importantly, decreases the attention directed towards unrelated blocks)

and once you have the attention on the entire block, you can retrieve the URL.

I’ll admit this is a bit contrived, but the idea is that instead of fetching ambiguous data, you take some time to amplify a concept before digging through your deeper context.

This can take some time to get right with a weak model. You generally always trade inference cost for development cost

Other stuff you can of course do:

try to improve your data, to remove uninformative boilerplate and fluff
try to pre-select stuff before giving it to the conversational LLM (present fewer options)
use a better embedding model to return fewer, but more relevant results

Topic		Replies	Views
Strategies for long RAG conversations Prompting rag	14	4801	May 17, 2024
How does the model decide what query to pass to RAG functions? API rag , gpt-4-turbo	19	1805	March 22, 2024
Open AI prompts for RAG / doc Q&A API api	11	6678	January 9, 2024
Assistants API is Killing Me API api , api-billing , assistants , assistants-api , cost	38	3235	February 13, 2024
LLM forgetting part of my prompt with too much data Prompting chatgpt , prompt	17	10060	May 25, 2024

RAG input via System message: JSON vs plain text

Related topics