GPT API can not do image coordinates right

vojta.siler · October 22, 2025, 11:59am

Hey, so I am trying to work on a project where I extract a part of an image. I am using the GPT models (4.1, 5) to achieve this. I send my base64 encoded image through the API, and ask it to extract a certain part of the image (a photo) and send it to me as coordinates (x1,y1,x2,y2).

When I try this in the web chat environment, it always returns perfect coordinates that have exactly what I want.
When I try this through the API, the coordinates are always off. The sub image always contains some extra text, or just empty space on some sides, it doesn’t seem to be able to do it well.

Why is this? Why is there such a quality difference between the web chat environment, and the API? Is the API somehow modifying the base64 image, that it then just messes up the quality of the output?

georgezip · October 23, 2025, 2:08am

Try pasting your question into ChatGPT - it gave quite a good answer. Here’s the best one, I think:

Ask for normalized coordinates and rescale yourself

In your prompt, explicitly define the coordinate space and format:

“Return bounding box as normalized floats in [0,1]: {x_min, y_min, x_max, y_max} relative to the image width/height (origin top-left). No other text.”

vojta.siler · October 23, 2025, 7:36am

Sadly, this does not work very well either. Still clipping the images. I suspect the ChatGPT web service has some behind the scenes vision processing, that is simply not available through the API, which is a massive shame.

Mael · November 30, 2025, 9:53pm

Hi @vojta.siler

I’m actually facing the same issue. I ran several tests with different screenshots, and the web version is consistently much more accurate and reproducible for coordinates, while the API keeps giving incorrect or inconsistent results.

Have you by any chance found a workaround or anything that improved the accuracy on the API side?

Topic		Replies	Views
Getting GPT Vision To Return Coordinates Prompting gpt-4 , gpt-4-vision	10	10134	July 30, 2025
Gpt-4-vision-preview vs web chat results and configuration API	5	1492	October 31, 2024
Getting data from other peoples images on vision API Bugs gpt-4	1	108	August 17, 2024
GPT4 V Object detection bounding box value incorrect Prompting gpt-4 , gpt-4-vision	1	2710	June 29, 2024
Is GPT4-o dumber in Assistans API than in normal chat? API gpt-4o	4	908	August 21, 2025

GPT API can not do image coordinates right

Related topics