GPT-4o forgets image data and sometimes gives answers that have nothing to do with the image

Although the chat gpt-4o API I use in my application is in the same chat history, when I send a new question about the previous image I sent with the URL, the API sometimes says it cannot see the image, and sometimes gives a correct or wrong answer.
Also, sometimes they can give very wrong answers that have nothing to do with the image.

Welcome to the community!

Can you give us an example of how you’re sending off your first and second requests? (just the messages list) :thinking:

1 Like

Now I am sharing a real chat example with you. Don’t be misdirected by the assistant’s use of the 3.5 model in texts. The same thing happens when you use the 4o model in texts.

content:“How many jars are there in the image?”
createdAt: June 14, 2024 at 3:22:58 PM UTC+3
images: [https…storage.googleapiscom/…0VUpaaki4zQsKA]
role:“user”

content:“There are four jars in the image.”
createdAt: June 14, 2024 at 3:23:08 PM UTC+3
gptVersion: “gpt-4o-2024-05-13”
role: “assistant”

content: “What fruits are in the jars in the image?”
createdAt: June 14, 2024 at 3:23:30 PM UTC+3
role: “user”

content: “The fruits in the jars are strawberries, blueberries, raspberries, and blackberries. :strawberry::blueberries::grapes::olive:
createdAt: June 14, 2024 at 3:23:31 PM UTC+3
gptVersion: “gpt-3.5-turbo-0125”
role: “assistant”

content: “Wrong answer, can you double check the picture I sent?”
createdAt: June 14, 2024 at 3:24:34 PM UTC+3
role:“user”

content:“I apologize, but as an AI, I am unable to view or analyze images. If you could describe the fruits in the jars to me, I can help identify them for you.”
createdAt: June 14, 2024 at 3:24:35 PM UTC+3
gptVersion: “gpt-3.5-turbo-0125”
role:“assistant”

if you have an image anywhere in your context you want to use, you need to use a vision capable model. gpt-3.5 doesn’t have vision.

The model doesn’t “remember” what it saw “earlier” in the conversation. the concept of time doesn’t really exist here, the models need to look at everything again with every run.

just to confirm:

  • you’re using the openai assistants framework
  • you’re changing models mid conversation

Here’s why I’m asking:

some people are sending one message after another against the completion endpoint, and wondering why the model keeps “forgetting/hallucinating” the past conversation. With a completion endpoint, you need to send the whole conversation history to the API with every new message. If you sent images, you’ll need to send them again with every call.

If you’re using the assistant thing, it should manage that for you.

But a non-vision model absolutely cannot interpret images for you. I’m surprised (but not that surprised) that you don’t get an error.

Can you give an example where you call it correctly and it still doesn’t work?

1 Like

Let me show you a new example with fully gpt-4o model:
content:“Hello, how many objects are in the image?”
createdAt:June 14, 2024 at 4:18:24 PM UTC+3
images:[“https…storage.googleapis.com…G%2FA%3D%3D”]
role:“user”

content:“The image contains four objects.”
createdAt: June 14, 2024 at 4:18:26 PM UTC+3
gptVersion: “gpt-4o-2024-05-13”
role:“assistant”

content:“It looks like there might have been a misunderstanding – I cannot view images. However, if you describe the red object to me, I can try to help identify it! :blush::rocket:
createdAt:June 14, 2024 at 4:19:14 PM UTC+3
gptVersion: “gpt-4o”
role:“assistant”

I am not using the Assistant API that OpenAI providers. Actually, my server side code is as follows, let me show some parts of it:

const chatHandler = (): { [k: string]: RequestDefault } => ({
  async interpret(req, res): ResponsePromise {
    try {
      const { messageHistory, config, userId, sessionId, userMessage, imageUrls } = req.body
....

  let _messageHistory = handleHistoryLimit(messageHistory);
        _messageHistory = handleMaxAllowedChars(_messageHistory);

        messages = getMessages(_messageHistory, config);
....

        const completion = await openai.chat.completions.create({
          messages: messages,
          model: isMember ? 'gpt-4o' : 'gpt-3.5-turbo-0125',
        });

        const answer = completion.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: isMember ? 'gpt-4o' : 'gpt-3.5-turbo-0125',
          role: 'assistant',
        };
      } else if (userMessage && imageUrls) {
        messages = [
          {
            role: 'user',
            content: [
              { type: 'text', text: userMessage },
              ...imageUrls.map((imageUrl) => ({
                type: 'image_url',
                image_url: { url: imageUrl },
              })),
            ],
          },
        ];

        const response = await openai.chat.completions.create({
          model: 'gpt-4o',
          messages: messages,
        });

        const answer = response.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: response.model,
          role: 'assistant',
        };

can you json dump this message object, and turn off the model switch mechanic?

I tried as you sent but the result is still the same :confused:

content: “How many objects are in the image?”
createdAt:June 14, 2024 at 4:47:29 PM UTC+3
images: “https…storage.googleapis.com/…2B2CSMosQ%3D%3D”
role:“user”

content: “There are four objects in the image.”
createdAt: June 14, 2024 at 4:47:39 PM UTC+3
gptVersion: “gpt-4o-2024-05-13”
role: “assistant”

content: “What kind of object is the red object in the image I sent?”
createdAt:June 14, 2024 at 4:48:22 PM UTC+3
role: “user”

content: “I’m sorry, but I can’t view images. However, I can help you with descriptions or answer questions!”
createdAt: June 14, 2024 at 4:48:24 PM UTC+3
gptVersion:“gpt-4o”
role:“assistant”

import json
print(json.dumps(messages))

I’d like to investigate what you’re actually sending

const chatHandler = (): { [k: string]: RequestDefault } => ({
  async interpret(req, res): ResponsePromise {
    try {
      const { messageHistory, config, userId, sessionId, userMessage, imageUrls } = req.body;
      const newMessage = messageHistory ? messageHistory[messageHistory.length - 1] : null;

      if (!userId) {
        return res.status(422).json(toRes(null, 'Missing userId.'));
      }
      if (!sessionId) {
        return res.status(422).json(toRes(null, 'Missing sessionId.'));
      }

      const openai = new OpenAI({
        organization: process.env.OPENAI_ORG_ID,
        apiKey: process.env.OPENAI_API_KEY,
      });

      let messages;
      let answerContent;

      if (messageHistory && config) {
        if (!newMessage || !newMessage.content) {
          return res.status(422).json(toRes(null, 'Missing information inside data object.'));
        }

        let _messageHistory = handleHistoryLimit(messageHistory);
        _messageHistory = handleMaxAllowedChars(_messageHistory);

        if (_messageHistory.length <= 0) {
          return res.status(413).json(toRes(null, 'Object is too big.'));
        }

        messages = getMessages(_messageHistory, config);

        // JSON dump the messages object
        console.log('Messages-1:', JSON.stringify(messages, null, 2));

        const completion = await openai.chat.completions.create({
          messages: messages,
          model: 'gpt-4o',
        });

        const answer = completion.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: 'gpt-4o',
          role: 'assistant',
        };
      } else if (userMessage && imageUrls) {
        messages = [
          {
            role: 'user',
            content: [
              { type: 'text', text: userMessage },
              ...imageUrls.map((imageUrl) => ({
                type: 'image_url',
                image_url: { url: imageUrl },
              })),
            ],
          },
        ];

        // JSON dump the messages object
        console.log('Messages-2:', JSON.stringify(messages, null, 2));

        const response = await openai.chat.completions.create({
          model: 'gpt-4o',
          messages: messages,
        });

        const answer = response.choices[0].message.content.trim();

        answerContent = {
          content: answer,
          createdAt: admin.firestore.FieldValue.serverTimestamp(),
          gptVersion: response.model,
          role: 'assistant',
        };

        printLog(response.choices[0].message.content, 'yellow');
      } else {
        return res.status(422).json(toRes(null, 'Missing information inside data object.'));
      }

      await admin
        .firestore()
        .collection('users')
        .doc(userId)
        .collection('chats')
        .doc(sessionId)
        .collection('messages')
        .add(answerContent);

      return res.status(201).json(toRes({ answer: answerContent.content }, 'Request interpreted successfully!', false));
    } catch (error) {
      printLog(error, 'red');
      return res.status(500).json(toRes(error, `Internal Server Error: ${String(error.message)}`));
    }
  },
});

Messages-2: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “How many objects are in the image and what are their types?”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “https…storage.googleapis.com/…606lA%3D%3D”
}
}
]
}
]

printLog(response.choices[0].message.content, ‘yellow’);
====>
The image contains four objects:

  1. A pink rectangular prism.
  2. A red cylinder.
  3. A beige cube.
  4. A dark brown rectangular prism.

Messages-1: [
{
“role”: “system”,
“content”: "Your name is ***… "
},
{
“role”: “assistant”,
“content”: “Hi,There!”
},
{
“role”: “user”,
“content”: “How many objects are in the image and what are their types?”
},
{
“role”: “assistant”,
“content”: “The image contains four objects:\n1. A pink rectangular prism.\n2. A red cylinder.\n3. A beige cube.\n4. A dark brown rectangular prism.”
},
{
“role”: “user”,
“content”: “Which is the longest object in the image I sent?”
}

And the last response of the assistant: Sorry, I can’t view images. Describe the objects, please?

I think the problem is that the image is not saved back into history

1 Like

4o is trash compared to 4 to be frank - a total downgrade.
i keep all my chats and the quality downgrade is immediately apparent.

it got so bad that i configured my gpts to use 4 exclusively.

yeah, you need to send it again. you can’t just strip out the image out on your subsequent call…

It’s obvious that it can’t see something you’re not sending it :confused:

2 Likes