GPT4 Limiting Examples Cited in RAG Q&A

dravinian · October 26, 2023, 3:18pm

I have set the max tokens from 1,000 to 2,500 and this does not make a difference.

If I send to the API a 20 document set, and ask GPT to provide an answer to a question about those documents. It will respond with 5 examples, but will ignore the other 15 documents.

I am not sure why this is the case, all 20 documents should yield a response from the AI - and occasionally it can be 7 or 8, it is not ALWAYS 5, but more often than not - it is 5.

Is anyone aware of this issue and any workarounds?

Diet · October 26, 2023, 8:41pm

Hi!

can you elaborate on your setup? are you stuffing a prompt with 20 similar retrievals? that might be too much (regardless of maxtokens). we call it context oversaturation, I don’t know what it’s officially called.

depending on what you’re trying to do, it might make more sense to split the load. if the examples are are all independent, a more robust solution would be to work them off in parallel. I understand that due to OpenAI’s nonsensical context billing that this might be more expensive, but it does reduce confusion

but if neither of that is helpful:

on the instruction side, we’ve noticed that the chat models are hesitant to go beyond a count of 50 in anything. not sure where that’s coming from, but we’ve never tried to force it, just accept and work around it. one thing to note here is that it’s a good idea to ensure the prompt specifies that we want concrete work done, not examples. “let’s take a deep breath and start digging into into it!”, “the boss doesn’t want examples, it’s up to us to work it out and thoroughly”, that kind of stuff.

one last thing that we often do with list loads (although this is easier with davinci / instruct) is to programmatically check if we’re done, and then send the result back against the api until the condition is satisfied (e.g. list length >= 20)

dravinian · October 27, 2023, 12:09pm

Thanks some interesting thoughts.

For the over saturation, I cut the number of docs being sent, this is no more than 20k tokens or 20 docs - these numbers from my testing avoided confusing the AI.

The issue tends to arise on questions when you ask it summarise certain aspects of documents.

“Can you summarise passages that relate to the cost of petrol prices?”

It will summarise 5 documents, but we know from both the documents themselves and the vectors returned - that all 20 documents contain passages related to petrol prices.

The splitting idea appeals, as it is the same contextual cost, but may introduce a little less workload for the AI on an individual request. I might give that a shot.

Topic		Replies	Views
Trouble extracting all information from long context document API gpt-4	6	1353	October 29, 2024
Response being cut off in Azure OpenAI API	6	2243	January 30, 2024
Issues with Truncated Responses API	3	2523	April 22, 2024
Any idea how to input more than 8k token in GPT 4? Prompting gpt-4	4	2047	December 17, 2023
Chained Prompt to complete text larger than 4000 tokens? API	14	6138	December 25, 2023

GPT4 Limiting Examples Cited in RAG Q&A

Related topics