Hello,
First time writing, hopefully I’m in the right place.
I’ve been using ChatGPT for some time now and I want to take it a step further and leap into the API.
I’ll be building a backend with node.js and people that are using my ERP are going to upload images, in various formats of various things, such as: bills, tickets, invoices, etc., and those images will be sent to the server in base64 format.
What I’m trying to achieve is to call the API once by backend receives the image and be able to get as a response just a JSON that gives me back the total of the bill/tickets/invoice/etc., and that’s pretty much it.
So before diving into it I’d like to hear more experienced people on what model should I use (was thinking gpt-4o-mini), and what bottle necks I might find doing this. Keep in mind that there won’t be many requests, as I only have a few hundred users. However, I’d like to use best practices and not end up with a surprising bill myself.
In the prompt I can easily do this and it works 95% of the times:
“I’ll send you pictures with images of bills or tickets. For each image I want you to answer with a JSON, with a KEY called ‘total’. For each ticket or bill I want you to just answer with that JSON and key telling me the final amount that was paid. Just answer with that, nothing more.”
Thanks in advance,