GPT4 Limiting Examples Cited in RAG Q&A

I have set the max tokens from 1,000 to 2,500 and this does not make a difference.

If I send to the API a 20 document set, and ask GPT to provide an answer to a question about those documents. It will respond with 5 examples, but will ignore the other 15 documents.

I am not sure why this is the case, all 20 documents should yield a response from the AI - and occasionally it can be 7 or 8, it is not ALWAYS 5, but more often than not - it is 5.

Is anyone aware of this issue and any workarounds?


can you elaborate on your setup? are you stuffing a prompt with 20 similar retrievals? that might be too much (regardless of maxtokens). we call it context oversaturation, I don’t know what it’s officially called.

depending on what you’re trying to do, it might make more sense to split the load. if the examples are are all independent, a more robust solution would be to work them off in parallel. I understand that due to OpenAI’s nonsensical context billing that this might be more expensive, but it does reduce confusion

but if neither of that is helpful:

on the instruction side, we’ve noticed that the chat models are hesitant to go beyond a count of 50 in anything. not sure where that’s coming from, but we’ve never tried to force it, just accept and work around it. one thing to note here is that it’s a good idea to ensure the prompt specifies that we want concrete work done, not examples. “let’s take a deep breath and start digging into into it!”, “the boss doesn’t want examples, it’s up to us to work it out and thoroughly”, that kind of stuff.

one last thing that we often do with list loads (although this is easier with davinci / instruct) is to programmatically check if we’re done, and then send the result back against the api until the condition is satisfied (e.g. list length >= 20)

Thanks some interesting thoughts.

For the over saturation, I cut the number of docs being sent, this is no more than 20k tokens or 20 docs - these numbers from my testing avoided confusing the AI.

The issue tends to arise on questions when you ask it summarise certain aspects of documents.

“Can you summarise passages that relate to the cost of petrol prices?”

It will summarise 5 documents, but we know from both the documents themselves and the vectors returned - that all 20 documents contain passages related to petrol prices.

The splitting idea appeals, as it is the same contextual cost, but may introduce a little less workload for the AI on an individual request. I might give that a shot.