I am trying to do what seems to be a straight forward AI application. I am creating a .NET website that will upload a .pdf file (Purchase Order), search the relevant data from the PO and create or update data in the database as needed. Please provide feedback. It is greatly appreciated.
I am struggling with conflicting messaging in the documentation, or how I understand it.
I am uploading a file and trying to execute the threads/run API.
If I include a tools element of file_search in the API call, I get:
{
“error”: {
“message”: “Invalid tools: all tools must be of type function when response_format is of type json_schema.”,
“type”: “invalid_request_error”,
“param”: “response_format”,
“code”: null
}
}
When I remove the tools element, I get this:
{
“error”: {
“message”: “Missing required parameter: ‘thread.messages[0].attachments[0].tools’.”,
“type”: “invalid_request_error”,
“param”: “thread.messages[0].attachments[0].tools”,
“code”: “missing_required_parameter”
}
}
My current (reduced length) post body looks like this:
{
“assistant_id”: “asst_d3BTGbl3pdXCIqdvvxxxxxxx”,
“thread”: {
“messages”: [
{
“role”: “user”,
“content”: “Please provide the response in a JSON string. …”,
“attachments”: [
{
“file_id”: “file-5mMsBcXU54tjFwHSUUUUUU”
}
]
}
]
},
“model”: “gpt-4o-2024-08-06”,
“instructions”: “Please provide the response in a JSON string. …”,
“tools”: ,
“tool_resources”: null,
“temperature”: 0,
“tool_choice”: {
“type”: “file_search”
},
“response_format”: {
“type”: “json_schema”,
“json_schema”: {
“name”: “ZZZYYYPO”,
“schema”: {
“type”: “object”,
“properties”: {
“PO”: {
“type”: “string”
},
“PO_Date”: {
“type”: “string”
},
“PO_Lines”: {
“type”: “array”,
“items”: {
“type”: “object”,
“properties”: {
“PO_Line_Item”: {
“type”: “string”
},
“Part_Number”: {
“type”: “string”
}
},
“required”: [
“PO_Line_Item”,
“Part_Number”
],
“additionalProperties”: false
}
}
},
“required”: [
“PO”,
“PO_Date”,
“PO_Lines”
],
“additionalProperties”: false
},
“strict”: true
}
}
}
What you need to remove is the response schema and response_format parameter. Then, there is no utility in using “tool_choice” as that cannot refer to tools that are out of your control, only functions. Remove those tool parameters also.
If you are using the internal tool file_search to be able to search chunks of documents in a vector store, that will be something done automatically by the required instruction language that you’ll have to write telling the AI the purpose of calling file search and what it will find:
my_assistant = client.beta.assistants.create(
instructions=“You are a database assistant. Purchase order documents have been uploaded to file search, and you must write a search query with the highest quality semantic search match of expected PO PDF document chunks.”,
name=“DB Helper”,
tools=[{“type”: “file_search”}],
tool_resources={“file_search”: {“vector_store_ids”: [“vs_123”]}},
model=“gpt-4o”
)
Assistants document upload being a “file search” method and not a guarantee of document placement and not operating automatically, Assistants is likely the wrong way to provide guaranteed information about a single page document to AI. You should extract text yourself, review the quality, use chat completions…
What happens when the LLM hallucinates or makes stuff up and that ends up in your database? Are you manually verifying everything that is extracted? Seems like a lot of risk if you just think this can be automated using an LLM without any human oversight. I don’t think LLM is not the correct tool for this job at all.
When I upload the .pdf manually to gpt-4o, the model does very well at pulling out the data. I was hoping to see if the API could do a similar job. Yes, we plan on validating all output. I will try the suggestions above.
What still seems odd to me and contrary to the documentation is if I use the https://api.openai.com/v1/threads/runs endpoint the only way to attach a file is through a thread/message/attachment/file_id. If I attach the file, a tool is required and the tool must be a function.
I will try to create the thread separately and create a run to process it. Then, I can query what is in the file.
That is the same way that ChatGPT Plus works - you “upload” a document, but what is actually happening is that text extraction is performed and it is added to a search tool there also, based on a similarity search.
The API just exposes the mechanism, and it is your programming that needs to be in control if you want to have a “whole document” converted into text by software and then that text placed in a message so the AI can understand it without any need to write further searches - chat completions, and your own PDF extraction.
When you attach the file in assistants, the “tool” is not a function you provide yourself (where function is it’s own tool that is a container for external code methods you provide, a sub-type of the full “tool” hierarchy to which you do not have full access), but it is a file_search tool where the AI writes queries that are similar to what the user input needs. It is incompatible there with an AI that can only write JSON output.