GPT4 Streaming doesn't use all information from retrieved document

I am using a Retrieval Augmented Generation pattern as a QnA solution. When a query is sent, documents are retrieved from an Azure Cognitive Search index. The retrieved documents are the streamed into OpenAI GPT-4 using POST method.
The generated text is then streamed out and sent to my app.

Occasionally, when the text in the document is large, GPT-4 would respond that it has no information on the topic. This is especially the case if queries that ask about the later half of a document or if multiple documents are needed to make a summary.
I’m not entirely sure of the cause, but I suspect that GPT-4 is generating text as it is being streamed and tries to conclude the output text before it reaches to the relevant parts.

Are there any fixes to this? And what are the limits of streaming to GPT-4 that I should be aware of?
I am using Python 3.10 and my GPT- model is accessed using Azure OpenAI.

That doesn’t make any sense. The AI must be passed the full documentation and instructions of how to act on it in a format it can understand. Only when the input context is loaded can the AI then form a coherent answer.

I think that would not happen normally. I saw hallucination issues, but not like that, saying no document found. In your case, it raises suspicion that the document wasn’t properly searched for. The best approach would be to log the input value just before sending a request to GPT-4 for verification.

I’m currently using both stream true/false and processing a lot of queries. However, I haven’t come across this issue unless the document was not provided properly.