Information Vision Assistant API


I’m building an Assistant that allows users to chat with images. Sometimes, these images contain text with names and addresses. I have noticed that ChatGPT-4o often replaces street names and personal names in its responses with fabricated alternatives. For example:

Original name: Johnson Wayne
ChatGPT-4o name: Winston Kirk

The same issue occurs with street names. I tested this by sending the same text directly to ChatGPT-4o in the OpenAI chat console, and the problem didn’t occur there. I understand that the API might behave differently. I also lowered the temperature to 0.6, but it had no effect.

Additionally, I checked the image quality, which appears to be good. It seems that this replacement is done deliberately for privacy reasons. Previously, I used OCR to extract text and provided the extracted text as input, and this problem did not occur.

Is there any better practice approach?

Best regards,

How many images are you sending per API call? Are you using the assistants API or chat completion?

I’d try to lower the temperature to zero since this is quite a deterministic task.

I may also try chain of thought prompting to get the model to first list out the text it sees in the image, before returning the structured output. This can be the first key of your json if you’re using JSON model. This will, of course, incur more cost.

1 Like

Hi there @cyzgab, thanks for replying.

For now (testing phase) I’m just sending 1 image per call. I’m using the assistant API for creating messages and adding it to the assistant thread. I’m adding the image_url to the message content, couldn’t find any other way to send images to ChatGPT-4o.

I also tried it with temperature 0 and with different letters, still getting the same result.

Implementing chain of thought is indeed a good idea, I’m not working with JSON mode yet but maybe a good idea to start with it.

Thanks for the advice👍🏽