Question about GPT-4 Vision API and Limits in Image Analysis

chanopj · January 17, 2025, 2:29pm

Hi everyone,

I’m currently working on a Python program that uses the ChatGPT Vision API to analyze individual pages of a script. However, I’ve noticed that the results from GPT-4o Mini, which I’m using in the program, lack the level of detail and sharpness compared to the results I get when manually uploading screenshots to the GPT-4o web interface.

I’d like to achieve the same quality as GPT-4o in my program, but I’ve run into the issue of token limits in OpenAI’s API, which seem to get exceeded during image analysis.

Is there a way to use GPT-4o with Vision in the API, or is it only available as the Mini version?
Do I need to reach Tier 2 to access this functionality?
Are there any workarounds to achieve the same level of detail as the web interface within my program?

I’d really appreciate hearing about your experiences and any advice you have. Thanks in advance!

platypus · January 17, 2025, 4:04pm

Hi @chanopj and welcome to the community!

Yes, you can use the larger gpt-4o for vision as well.

I actually haven’t used mini for vision, but when using gpt-4o I used it on very complex tasks like parsing complicated graphs and inforgraphics and it performed admirably.

chanopj · January 27, 2025, 7:13pm

Hi everyone,

I’ve managed to get my old program working to analyze an image using GPT-4o, but I feel like the quality of the analysis is significantly better and more detailed when I use the GPT-4o web interface compared to the programmatic results.

Have any of you experienced something similar?
Could it be that the GPT-4o web interface uses a different model or configuration compared to the API?
Is there a specific setting or parameter I might be missing in the API to achieve the same level of detail?
Does anyone know if the web interface includes some extra post-processing or context that isn’t available in the API?

I’d really appreciate hearing about your thoughts or experiences with this. Thanks!

Diet · January 27, 2025, 7:27pm

One thing to keep in mind is that 4o !== 4o. There’s multiple different minor versions, and some of them are better than others. In my experience, the newer the version, the worse it generally is - but I stopped evaluating after the august '24 version I think.

that is very very likely the case. The API probably doesn’t have access to the specific version that ChatGPT uses. ChatGPT also has a different injected prompt than the API. And depending on the task, OpenAI might be actively working against you (Consuming more tokens than expected for image - Vision - gpt-4o - #4 by _j)

_j · January 27, 2025, 7:33pm

OpenAI says it does, though. A model called “chatgpt-4o-latest”.

However, that tools are disabled and the usage rate limit is 1% of API models gives you just something for personal experimentation.

In ChatGPT, OpenAI can be fulfilling different users with different test models or “which is better” models to gather feedback, and you wouldn’t be given any indication.

The most important thing:

The input images with mini cost twice as much as gpt-4o. Unless you are writing a whole bunch of output from the input, the cost will be higher and the quality will be lower with the mini model.

Tip: resize images yourself. See how they look at 900px at the longest dimension so there can be some detail:high overlap.

Tip: slice images up yourself for a coherent presentation at 512px maximum dimension used by detail:low.

Topic		Replies	Views
Inconsistencies in Image Analysis with GPT-4o-mini Using Low Detail API gpt-4o-mini	1	480	September 10, 2024
Token Usage for Images Remains Constant Regardless of Size - Is This a Bug? API	6	2286	September 23, 2024
Can GPT -vision models be accessed using API? API	15	1403	January 7, 2025
How to Use Vision Capabilities with GPT-4 via API? API	1	176	January 17, 2025
Help understand token usage with vision API API gpt-4-vision	7	1974	February 12, 2025

Question about GPT-4 Vision API and Limits in Image Analysis

Related topics