Context:
I’m using an OpenAI Assistant in an automation workflow to analyze and extract key information from various PDF documents which are chronologically correlated. The assistant’s knowledge compounds over time, with each document representing a new development being added to the vector store. However, it primarily needs to focus on processing and extracting information from the latest uploaded document while incorporating relevant information from previous documents in the latest document’s analysis.
Problem:
When asked to analyze and process the latest uploaded file (using file_search tool), the assistant sometimes retrieves information from a previous document instead of the correct, most recently uploaded file. This disrupts the workflow and leads to inaccurate analyses, as it seemingly ignores the latest document and processes a similar previous one.
Current Workflow:
before automation is deployed, a new thread is created which is linked to my assistant with detailed custom instructions. A vector store is also created which gets attached to the thread.
-
PDF file is uploaded
-
File is then added to the thread’s vector store with a specific
file_id
. -
New message is created - the file_id is attached to the user message and the message specifies the name of the latest file to retrieve, as seen below:
Mode: Automation. Please carry out your roles and duties for the latest document: "007. Final XXX letter 3.20.24.pdf". Ensure that you reference and incorporate any relevant information from previous documents in your analysis, if applicable.
-
create and poll a run, while also requiring file_search tool.
-
The response is then received in JSON (as per assistant instructions) and then used in subsequent steps of my automation.
Requirement:
The assistant needs to ensure it retrieves and processes the correct file as specified in the user message.
Questions:
My first guess is that it’s not retrieving the correct file due the the fileSearch tool:
“Rewrites user queries to optimize them for search.” (docs)
and some how the filename is getting confused with another file perhaps?
-
Can I specify the
file_id
in the message (instead of filename) for improved file retrieval precision instead? would that work? -
What is the best way to ensure accurate file retrieval for a workflow such as this?
-
Is there a better workflow/method to handle such a use case as this?
Any advice or solutions to ensure accurate file retrieval or to improve this workflow would be greatly appreciated.
thank you for your time!