Hey, so I am trying to work on a project where I extract a part of an image. I am using the GPT models (4.1, 5) to achieve this. I send my base64 encoded image through the API, and ask it to extract a certain part of the image (a photo) and send it to me as coordinates (x1,y1,x2,y2).
When I try this in the web chat environment, it always returns perfect coordinates that have exactly what I want.
When I try this through the API, the coordinates are always off. The sub image always contains some extra text, or just empty space on some sides, it doesn’t seem to be able to do it well.
Why is this? Why is there such a quality difference between the web chat environment, and the API? Is the API somehow modifying the base64 image, that it then just messes up the quality of the output?