Hi everyone,
i’m a bit uncertain, if i use assistants API correct for my use case. I achieved pretty good results just using regular ChatGPT with the gpt-4o, then uploading pdf invoices from construction projects and asking to return positions and prices from this document. Then I thought i’ll just build a simple web interface with the completions API, upload this PDF, sending a prompt with it and let GPT return JSON with the invoice items/positions. Turns out copmpletions API only accepts images, so i started using the assistants API.
I followed some tutorials and the official documentation i i’ve put somthing working up. But: As i understand file_search and vectorstores are made to build a knowledge base for my assistant. In my use case there isn’t really a knowledge base and i don’t want the assistant to get mixed up with former invoices that have been uploaded. That’s why i’m deleting the vector store every time and recreate it, when someone wants to upload a new pdf with invoice data. This feels like i’m using it not as it’s meant to be used.
So i was wondering: What exactly happens if I upload a PDF to standard ChatGPT? Is it something different? Whats the right way to make an assistant acces a single file that is only needed once and then can be forgotten about? Or is it the right way i’m doing it?
Thanks a lot in advance!
Best regards,
Marius
1 Like
Hi @vierminus and welcome to the community!
I am not sure I understood this statement:
Turns out completions API only accepts images
Could you elaborate?
You can send both text and images to chat completions API.
You are correct - Assistants API is meant to retain and retrieve knowledge over time - so in your case, it would be useful if you want to create an app where a user can query previous invoices.
Since you are essentially just doing a stateless service here of just extracting specific data given an invoice PDF, it would be better to simply use ChatCompletions directly.
This is highly dependent on the format and layout of your PDFs, but from my experience, treating PDFs as images (one image per page) for invoices and sending those as base64 encoded strings, together with textual content (see here) gives me best results, but it is more pricey.
Hey @platypus thank you for your answer!
Yeah, as you mentioned, it’s either an image or text, not a file attachment. In ChatGPT i can add the pdf file as it is as an attachment by clicking the paperclip icon and send it to the model and interact with it. I’m not sure what ChatGPT is doing behind the scenes with it
When i want to use the Chat Completions i have to convert it to text or image. Since these kind of invoices i’m working with have pretty wild and big table layouts i might lose to much context when i extract the text because the positioning is important. Extracting images can also be painful, since these pdfs sometimes have many pages. That’s why i thought this might neither be the right way. Sorry i can’t attach one, since it’s company / customer data.
That’s why i came up with my assistant-solution. And it’s working pretty ok, and isn’t that expensive, one run is ~ 5 cents with 4o and even less with 4o-mini. I just thought there might be a better way. Also it takes a while checking if there is already an assistant and an vector store and delete and rectreate them if this is the case, then upload the file and wait for a response.