I am using a Retrieval Augmented Generation pattern as a QnA solution. When a query is sent, documents are retrieved from an Azure Cognitive Search index. The retrieved documents are the streamed into OpenAI GPT-4 using POST method.
The generated text is then streamed out and sent to my app.
Occasionally, when the text in the document is large, GPT-4 would respond that it has no information on the topic. This is especially the case if queries that ask about the later half of a document or if multiple documents are needed to make a summary.
I’m not entirely sure of the cause, but I suspect that GPT-4 is generating text as it is being streamed and tries to conclude the output text before it reaches to the relevant parts.
Are there any fixes to this? And what are the limits of streaming to GPT-4 that I should be aware of?
I am using Python 3.10 and my GPT- model is accessed using Azure OpenAI.