After seeing the 12 Day of Shipmas Video 9, I am trying to find documentation on how to get the model to reference an uploaded file. I have tried this using Assistants and a JSON Schema with GPT-4o without much success.
For GPT-o1, will we still use Assistants? or Chat? Is there new documentation available to show how to use the API to do what was done in the video?
You will have to build your own document text extraction or semantic search engine, and place the input or results as part of a user message, indicating the enclosed plain text is documentation to be referenced or examined for knowledge.
There is no file-based method on chat completions.
I hear you, but this week OpenAI announced shipping api access to o1. See the 3 minute mark of https://www.youtube.com/watch?v=14leJ1fg4Pw. This is in the Chat section using o1.
I understand, but this is how the chat completions endpoint works, where the model operates. It allows (and requires) developers more direct placement and control of the messages and processes given to the AI.
O1 models further take only a subset of the API parameters that could exert fine control over operation.
Chat Completions does not have additional layers of internal tools (like for file search on the Assistants API) apart from function calling that you implement yourself. The model is not available currently on the assistants endpoint that could offer internal semantic search on documents.
There are two model types available today:
o1: reasoning model designed to solve hard problems across domains
o1-mini: fast and affordable reasoning model for specialized tasks
The latest o1 model supports both text and image inputs, and produces text outputs (including Structured Outputs). o1-mini currently only supports text inputs and outputs.
Text means you are the one producing that text, such as knowledge from documents.
For those that may be struggling with this scenario as I did for far too long, I got it working nearly flawlessly. I was trying to get one of the models to read a JPG and analyze the contents. After trying too many things, I ended up calling an API to convert the PDF to a list of JPG images. I only needed the first few pages in my case. I then passed the JPG images to the model in the user input. There was not much reasoning that needed done, so I used the gpt-4o model. I passed in a json_schema and it extracted everything I asked for, PO number, Date, multiple PO Lines, with Part #, Serial #, Description, Quantity, Price, etc. It works perfectly. Of course, we visually check every PO coming in, but it is a whole lot better and accurate than our current data entry. This is on a VB.NET web application. Reach out if you have any questions.