Large document - Inject into API or use knowledge base with semantic search?

I have a large document (>40 pages) that I need to query for ~70 questions. I have a script to inject the text of the document along with the prompts for all the questions in a single shot to GPT-4turbo API, and it gives reasonable answers, but not always perfect. And sometimes answers are influenced by the presence of other questions (if I run each question on its own I get different answers)

Would I get more consistent and better answers using a knowledge base with semantic search in Azure?

Are you using OpenAI Assistant? Or just passing the text to Chat Completion APIs and prompting questions to that?

Passing the text to the chat completion API, and including all of the questions in one injection

The answer depends on your implementation of the semantic search.
What I can tell you right now is that using the vector approach will be a lot cheaper if you intent to repeat this process often. You can probably already see this on your usage bill.

The money you will be saving is likely going partially into the time it will take you to tune the retrieval to your needs.

If you would like some specific advice to shorten that time maybe you can share additional information about your use case.

I think, using OpenAI Assistant APIs would give you better results. Extract the text, create a txt file, upload it to vector store and use an Assistant to do the file_search. When you add the file to vector store, it automatically embeds it and stores it in vector store database.


Thanks for the help. Here is the use case:
We are trying to audit a large document to assure compliance with our internal procedural requirements. We have distilled the requirements to a list of ~70 questions. So we have a simple interface created to drop a docx file for the document, which mammoth turns into text. Then it injects the text and all of the questions to gpt4turbo api. When it gives a response, our interface then organizes the output so you can easily see an answer for each of the 70 questions.

It works pretty well overall, but some inaccuracies and inconsistencies. For example, if I ask one question by itself, I will get an accurate answer. But if I ask all the questions in one shot, then I get a totally different (an inaccurate) answer for that specific question.

I’m wondering if we should try creating a knowledge base in Azure, dropping to docx there, and then querying the 70 questions one by one? Will that give more accurate answers? I don’t want to inject a document 70 times into gtp4, as that seems like it will be costly.

1 Like

Thanks for clarifying the requirements.
I actually do like the suggestion from @MrFriday . Setting up a V2 assistant with file retrieval can be a fast and easy starting point for your exploration.

If you aim to answer 70 questions correct in a single prompt, then that does sound unlikely but of course we are now also looking to speed up the process by answering more questions correct per prompt while keeping the costs down.

Since OpenAI has documented how their search system works you will have some indication as to where you are coming from when working to improve the retrieval.

If you end up doing so, the question will be how to chunk and rank the retrieved matches to your query in a way that will enable the model to answer the questions correctly.

1 Like