I’m currently working on a Python program that uses the ChatGPT Vision API to analyze individual pages of a script. However, I’ve noticed that the results from GPT-4o Mini, which I’m using in the program, lack the level of detail and sharpness compared to the results I get when manually uploading screenshots to the GPT-4o web interface.
I’d like to achieve the same quality as GPT-4o in my program, but I’ve run into the issue of token limits in OpenAI’s API, which seem to get exceeded during image analysis.
Is there a way to use GPT-4o with Vision in the API, or is it only available as the Mini version?
Do I need to reach Tier 2 to access this functionality?
Are there any workarounds to achieve the same level of detail as the web interface within my program?
I’d really appreciate hearing about your experiences and any advice you have. Thanks in advance!
Yes, you can use the larger gpt-4o for vision as well.
I actually haven’t used mini for vision, but when using gpt-4o I used it on very complex tasks like parsing complicated graphs and inforgraphics and it performed admirably.
I’ve managed to get my old program working to analyze an image using GPT-4o, but I feel like the quality of the analysis is significantly better and more detailed when I use the GPT-4o web interface compared to the programmatic results.
Have any of you experienced something similar?
Could it be that the GPT-4o web interface uses a different model or configuration compared to the API?
Is there a specific setting or parameter I might be missing in the API to achieve the same level of detail?
Does anyone know if the web interface includes some extra post-processing or context that isn’t available in the API?
I’d really appreciate hearing about your thoughts or experiences with this. Thanks!
One thing to keep in mind is that 4o !== 4o. There’s multiple different minor versions, and some of them are better than others. In my experience, the newer the version, the worse it generally is - but I stopped evaluating after the august '24 version I think.
that is very very likely the case. The API probably doesn’t have access to the specific version that ChatGPT uses. ChatGPT also has a different injected prompt than the API. And depending on the task, OpenAI might be actively working against you (Consuming more tokens than expected for image - Vision - gpt-4o - #4 by _j)
OpenAI says it does, though. A model called “chatgpt-4o-latest”.
However, that tools are disabled and the usage rate limit is 1% of API models gives you just something for personal experimentation.
In ChatGPT, OpenAI can be fulfilling different users with different test models or “which is better” models to gather feedback, and you wouldn’t be given any indication.
The most important thing:
The input images with mini cost twice as much as gpt-4o. Unless you are writing a whole bunch of output from the input, the cost will be higher and the quality will be lower with the mini model.
Tip: resize images yourself. See how they look at 900px at the longest dimension so there can be some detail:high overlap.
Tip: slice images up yourself for a coherent presentation at 512px maximum dimension used by detail:low.