Information Vision Assistant API

destangokalp · May 22, 2024, 7:42am

Hi,

I’m building an Assistant that allows users to chat with images. Sometimes, these images contain text with names and addresses. I have noticed that ChatGPT-4o often replaces street names and personal names in its responses with fabricated alternatives. For example:

Original name: Johnson Wayne
ChatGPT-4o name: Winston Kirk

The same issue occurs with street names. I tested this by sending the same text directly to ChatGPT-4o in the OpenAI chat console, and the problem didn’t occur there. I understand that the API might behave differently. I also lowered the temperature to 0.6, but it had no effect.

Additionally, I checked the image quality, which appears to be good. It seems that this replacement is done deliberately for privacy reasons. Previously, I used OCR to extract text and provided the extracted text as input, and this problem did not occur.

Is there any better practice approach?

Best regards,

cyzgab · May 22, 2024, 10:17am

How many images are you sending per API call? Are you using the assistants API or chat completion?

I’d try to lower the temperature to zero since this is quite a deterministic task.

I may also try chain of thought prompting to get the model to first list out the text it sees in the image, before returning the structured output. This can be the first key of your json if you’re using JSON model. This will, of course, incur more cost.

destangokalp · May 22, 2024, 11:03am

Hi there @cyzgab, thanks for replying.

For now (testing phase) I’m just sending 1 image per call. I’m using the assistant API for creating messages and adding it to the assistant thread. I’m adding the image_url to the message content, couldn’t find any other way to send images to ChatGPT-4o.

I also tried it with temperature 0 and with different letters, still getting the same result.

Implementing chain of thought is indeed a good idea, I’m not working with JSON mode yet but maybe a good idea to start with it.

Thanks for the advice👍🏽

Topic		Replies	Views
Assistant struggling reading image, whilst chat completion is right 99% of the time API gpt-4 , chat-completion , assistants-api	2	269	May 30, 2024
GPT-4 omni text recognition via API works worse than on chatgpt.com API gpt-4 , api	4	1137	August 13, 2024
Using GPT-4o via assistants API vs ChatGPT Issue API	1	435	July 1, 2024
How to extract text from images using API? API gpt-4	2	354	January 31, 2025
Integrating Custom Image Generation with ChatGPT API	1	1753	February 28, 2024

Information Vision Assistant API

Related topics