Which API Model to get answers from an Image?

Dear guys, please dont kill me for this question but i can’t realy found enough to get be sure, that this will solve my problem.

I want to Upload an Image and needs to get the answers from the image for example on the image is a question like “what is the difference between Germany and America” and chatgpt needs to answer the question. I know i can use gpt4-vision but this costs pretty much for the usecase. When i just upload an image in chatgpt+ it will give me the answers but i dont know which model it is or do i have to enable add code intepreter? Is there a way to use gpt 3.5 to even minimisze the costs?

I am happy to get some help guys, I am super new and try my best everyday :sunny:

Hi and welcome to the Developer Forum!

The Vision model is the same in the API and ChatGPT. ChatGPT just calls it behind the scenes. There is no GPT3.5 vision model, there is a low detail vision call that sues less tokens, but the cost is still from 1 to 10 cents per image with an typical 500 token prompt and a 512x512 image with a few hundred tokens of output.


Thank you so much for the fast reply!

Well okay, that means it will be pretty expensive everytime I upload an Image. Mostly people gonna Upload 5 Images to get the answers from GPT. Sadly this will not work for my usecase. :frowning:

Well, one of my applications makes around 30 image API calls per “client” and they have typically 500-1000 token prompts with a few hundred in response and the block of 30 calls costs be about 9 cents.

What? Thats crazy! Do you have any tutorial to understand more of this? When i made an API request of an simple image and tell him, to answer the questions i get like 50 cents per reqeust. I am doing somenthing wrong and dont know what -.-