GPT-4o forgets image data and sometimes gives answers that have nothing to do with the image

omer_alemdar · June 14, 2024, 11:47am

Although the chat gpt-4o API I use in my application is in the same chat history, when I send a new question about the previous image I sent with the URL, the API sometimes says it cannot see the image, and sometimes gives a correct or wrong answer.
Also, sometimes they can give very wrong answers that have nothing to do with the image.

Diet · June 14, 2024, 11:59am

Welcome to the community!

Can you give us an example of how you’re sending off your first and second requests? (just the messages list)

omer_alemdar · June 14, 2024, 12:34pm

Now I am sharing a real chat example with you. Don’t be misdirected by the assistant’s use of the 3.5 model in texts. The same thing happens when you use the 4o model in texts.

content:“How many jars are there in the image?”
createdAt: June 14, 2024 at 3:22:58 PM UTC+3
images: [https…storage.googleapiscom/…0VUpaaki4zQsKA]
role:“user”

content:“There are four jars in the image.”
createdAt: June 14, 2024 at 3:23:08 PM UTC+3
gptVersion: “gpt-4o-2024-05-13”
role: “assistant”

content: “What fruits are in the jars in the image?”
createdAt: June 14, 2024 at 3:23:30 PM UTC+3
role: “user”

content: “The fruits in the jars are strawberries, blueberries, raspberries, and blackberries. ”
createdAt: June 14, 2024 at 3:23:31 PM UTC+3
gptVersion: “gpt-3.5-turbo-0125”
role: “assistant”

content: “Wrong answer, can you double check the picture I sent?”
createdAt: June 14, 2024 at 3:24:34 PM UTC+3
role:“user”

content:“I apologize, but as an AI, I am unable to view or analyze images. If you could describe the fruits in the jars to me, I can help identify them for you.”
createdAt: June 14, 2024 at 3:24:35 PM UTC+3
gptVersion: “gpt-3.5-turbo-0125”
role:“assistant”

Diet · June 14, 2024, 1:06pm

if you have an image anywhere in your context you want to use, you need to use a vision capable model. gpt-3.5 doesn’t have vision.

The model doesn’t “remember” what it saw “earlier” in the conversation. the concept of time doesn’t really exist here, the models need to look at everything again with every run.

just to confirm:

you’re using the openai assistants framework
you’re changing models mid conversation

Here’s why I’m asking:

some people are sending one message after another against the completion endpoint, and wondering why the model keeps “forgetting/hallucinating” the past conversation. With a completion endpoint, you need to send the whole conversation history to the API with every new message. If you sent images, you’ll need to send them again with every call.

If you’re using the assistant thing, it should manage that for you.

But a non-vision model absolutely cannot interpret images for you. I’m surprised (but not that surprised) that you don’t get an error.

Can you give an example where you call it correctly and it still doesn’t work?

omer_alemdar · June 14, 2024, 1:29pm

Let me show you a new example with fully gpt-4o model:
content:“Hello, how many objects are in the image?”
createdAt:June 14, 2024 at 4:18:24 PM UTC+3
images:[“https…storage.googleapis.com…G%2FA%3D%3D”]
role:“user”

content:“The image contains four objects.”
createdAt: June 14, 2024 at 4:18:26 PM UTC+3
gptVersion: “gpt-4o-2024-05-13”
role:“assistant”

content:“It looks like there might have been a misunderstanding – I cannot view images. However, if you describe the red object to me, I can try to help identify it! ”
createdAt:June 14, 2024 at 4:19:14 PM UTC+3
gptVersion: “gpt-4o”
role:“assistant”

I am not using the Assistant API that OpenAI providers. Actually, my server side code is as follows, let me show some parts of it:

const chatHandler = (): { [k: string]: RequestDefault } => ({
  async interpret(req, res): ResponsePromise {
    try {
      const { messageHistory, config, userId, sessionId, userMessage, imageUrls } = req.body
....

  let _messageHistory = handleHistoryLimit(messageHistory);
        _messageHistory = handleMaxAllowedChars(_messageHistory);

        messages = getMessages(_messageHistory, config);
....

        const completion = await openai.chat.completions.create({
          messages: messages,
          model: isMember ? 'gpt-4o' : 'gpt-3.5-turbo-0125',
        });

        const answer = completion.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: isMember ? 'gpt-4o' : 'gpt-3.5-turbo-0125',
          role: 'assistant',
        };
      } else if (userMessage && imageUrls) {
        messages = [
          {
            role: 'user',
            content: [
              { type: 'text', text: userMessage },
              ...imageUrls.map((imageUrl) => ({
                type: 'image_url',
                image_url: { url: imageUrl },
              })),
            ],
          },
        ];

        const response = await openai.chat.completions.create({
          model: 'gpt-4o',
          messages: messages,
        });

        const answer = response.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: response.model,
          role: 'assistant',
        };

Diet · June 14, 2024, 1:40pm

can you json dump this message object, and turn off the model switch mechanic?

omer_alemdar · June 14, 2024, 1:55pm

I tried as you sent but the result is still the same

content: “How many objects are in the image?”
createdAt:June 14, 2024 at 4:47:29 PM UTC+3
images: “https…storage.googleapis.com/…2B2CSMosQ%3D%3D”
role:“user”

content: “There are four objects in the image.”
createdAt: June 14, 2024 at 4:47:39 PM UTC+3
gptVersion: “gpt-4o-2024-05-13”
role: “assistant”

content: “What kind of object is the red object in the image I sent?”
createdAt:June 14, 2024 at 4:48:22 PM UTC+3
role: “user”

content: “I’m sorry, but I can’t view images. However, I can help you with descriptions or answer questions!”
createdAt: June 14, 2024 at 4:48:24 PM UTC+3
gptVersion:“gpt-4o”
role:“assistant”

Diet · June 14, 2024, 2:15pm

import json
print(json.dumps(messages))

I’d like to investigate what you’re actually sending

omer_alemdar · June 14, 2024, 2:31pm

const chatHandler = (): { [k: string]: RequestDefault } => ({
  async interpret(req, res): ResponsePromise {
    try {
      const { messageHistory, config, userId, sessionId, userMessage, imageUrls } = req.body;
      const newMessage = messageHistory ? messageHistory[messageHistory.length - 1] : null;

      if (!userId) {
        return res.status(422).json(toRes(null, 'Missing userId.'));
      }
      if (!sessionId) {
        return res.status(422).json(toRes(null, 'Missing sessionId.'));
      }

      const openai = new OpenAI({
        organization: process.env.OPENAI_ORG_ID,
        apiKey: process.env.OPENAI_API_KEY,
      });

      let messages;
      let answerContent;

      if (messageHistory && config) {
        if (!newMessage || !newMessage.content) {
          return res.status(422).json(toRes(null, 'Missing information inside data object.'));
        }

        let _messageHistory = handleHistoryLimit(messageHistory);
        _messageHistory = handleMaxAllowedChars(_messageHistory);

        if (_messageHistory.length <= 0) {
          return res.status(413).json(toRes(null, 'Object is too big.'));
        }

        messages = getMessages(_messageHistory, config);

        // JSON dump the messages object
        console.log('Messages-1:', JSON.stringify(messages, null, 2));

        const completion = await openai.chat.completions.create({
          messages: messages,
          model: 'gpt-4o',
        });

        const answer = completion.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: 'gpt-4o',
          role: 'assistant',
        };
      } else if (userMessage && imageUrls) {
        messages = [
          {
            role: 'user',
            content: [
              { type: 'text', text: userMessage },
              ...imageUrls.map((imageUrl) => ({
                type: 'image_url',
                image_url: { url: imageUrl },
              })),
            ],
          },
        ];

        // JSON dump the messages object
        console.log('Messages-2:', JSON.stringify(messages, null, 2));

        const response = await openai.chat.completions.create({
          model: 'gpt-4o',
          messages: messages,
        });

        const answer = response.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: response.model,
          role: 'assistant',
        };

        printLog(response.choices[0].message.content, 'yellow');
      } else {
        return res.status(422).json(toRes(null, 'Missing information inside data object.'));
      }

      await admin
        .firestore()
        .collection('users')
        .doc(userId)
        .collection('chats')
        .doc(sessionId)
        .collection('messages')
        .add(answerContent);

      return res.status(201).json(toRes({ answer: answerContent.content }, 'Request interpreted successfully!', false));
    } catch (error) {
      printLog(error, 'red');
      return res.status(500).json(toRes(error, `Internal Server Error: ${String(error.message)}`));
    }
  },
});

Messages-2: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “How many objects are in the image and what are their types?”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “https…storage.googleapis.com/…606lA%3D%3D”
}
}
]
}
]

printLog(response.choices[0].message.content, ‘yellow’);
====>
The image contains four objects:

A pink rectangular prism.
A red cylinder.
A beige cube.
A dark brown rectangular prism.

Messages-1: [
{
“role”: “system”,
“content”: "Your name is ***… "
},
{
“role”: “assistant”,
“content”: “Hi,There!”
},
{
“role”: “user”,
“content”: “How many objects are in the image and what are their types?”
},
{
“role”: “assistant”,
“content”: “The image contains four objects:\n1. A pink rectangular prism.\n2. A red cylinder.\n3. A beige cube.\n4. A dark brown rectangular prism.”
},
{
“role”: “user”,
“content”: “Which is the longest object in the image I sent?”
}

And the last response of the assistant: Sorry, I can’t view images. Describe the objects, please?

I think the problem is that the image is not saved back into history

huguito · June 15, 2024, 2:25am

4o is trash compared to 4 to be frank - a total downgrade.
i keep all my chats and the quality downgrade is immediately apparent.

it got so bad that i configured my gpts to use 4 exclusively.

Diet · June 15, 2024, 3:20pm

yeah, you need to send it again. you can’t just strip out the image out on your subsequent call…

It’s obvious that it can’t see something you’re not sending it

Topic		Replies	Views
OpenAI has deprecated direct image processing through its API API	12	807	January 27, 2025
4o & turbo models can't read images anymore API	5	2391	June 4, 2024
Image_url is only supported by certain models Bugs api	24	6163	February 18, 2025
GPT-4o model saying it can't read my images API gpt-4 , chatgpt , api	15	17170	November 27, 2024
GPT-4-VISION forgets image data? API gpt-4 , api , gpt-4-vision	4	1404	November 20, 2023

GPT-4o forgets image data and sometimes gives answers that have nothing to do with the image

Related topics