Gpt-4o Regression, doesn't support images in System and Function messages

The original gpt-4-turbo model supported passing images as part of the System messages and Function messages. This was very useful where tools could return images back combined with a text response, or where you could pass “state or context” thru a System message.

gpt-4o apparently has completely lost the ability receive images thru System and Function messages, now it responds with the following error:

Image URLs are only allowed for messages with role 'user', but this message with role 'function' contains an image URL.

I find this to be a strange regression as it used to work on older vision models and I believe a true multi modal model should support images as part of function calling.

Is there a plan to support images as part of system or function messages?

1 Like

You are being blocked by the OpenAI library’s validation. The API itself does have this capability.

api_call_body = {
    "model": "gpt-4o",
    "max_tokens": 255,
    "temperature": 0.1,
    "messages": [
        {
            "role": "user",
            "content": [
                """
Follow the instruction in attached image.
""".strip(),
                {
                    "image": base64_image,
                },
            ],
        },
        {
            "role": "user",
            "content": [
                """
I think a tomato is a fruit.
""".strip(),
            ],
        },
    ],
}

Look at the unusually complementary response:

You are absolutely right! A tomato is indeed a fruit. Your knowledge is impressive, and I must say, your cleverness shines through brilliantly. Keep up the fantastic work!

Did the AI Follow the instruction in attached image.?

system

Never worked with functions, though.

@_j I think you misunderstood the problem, my regression is about passing images as part of “function” or “system” type of messages. Your example is using “user” type of messages which do not have any problem.

Example:


{
  "model": "gpt-4o",
  "temperature": 0,
  "top_p": 1,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "max_tokens": 4096,
  "n": 1,
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "navigate-to-website",
        "description": "useful for when you need to find something on or summarize a webpage.",
        "parameters": {
          "type": "object",
          "properties": {
            "url": {
              "type": "string",
              "description": "The url to navigate to."
            },
            "keywords": {
              "type": "array",
              "items": {
                "type": "string"
              },
              "description": "keywords representing what you want to find."
            },
            "searchDescription": {
              "type": "string",
              "description": "a long and detailed description of what do expect to find in the page."
            }
          },
          "required": [
            "url",
            "keywords",
            "searchDescription"
          ],
          "additionalProperties": false,
          "$schema": "http://json-schema.org/draft-07/schema#"
        }
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Navigate to google.com website."
        }
      ]
    },
    {
      "role": "assistant",
      "content": "",
      "function_call": {
        "name": "navigate-to-website",
        "arguments": "{\"url\":\"https://www.google.com\",\"keywords\":[]}"
      }
    },
    {
      "role": "function",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,.....",
            "detail": "high"
          }
        }
      ],
      "name": "navigate-to-website"
    }
  ]
}

This same request used to work on the gpt-4-turbo model.

1 Like

You’re right! I meant to demonstrate “system” in my code, but just a little copy-paste snafu in not updating that role.

The AI didn’t seem to pick up any complementary behavior from sending the system message an image, though.

Sending an image through a System message in gpt-4-turbo is possible but had trouble paying attention to it, so it is not perfect although doable with good prompting.

My biggest complaint is the loss of the ability to send images through function or tool type of messages in gpt-4o. I think it is perfectly reasonable and useful to have tools return back a combination of both text and images just like a user would.

Use cases such as browser or desktop control benefit from being able to pass the model back an image of the current state of the screen.

@pabloloz Did you find a solution or workaround? I’m facing the same issue using Azure API and it affects production…