Gpt4-o Support for Image URLS as tool responses

The previous Gpt4-turbo model supported image urls being part of a tool response which was useful for tools that responded back images to the models as a result of the tool call.

The new gpt4o models have apparently lost this functionality and can only receive images from messages with user role.

Is there a way to workaround this regression or is there any plan to fix the issue?

Example request:

[
	{
		"role": "user",
		"content": [
			{
				"type": "text",
				"text": "Navigate to google.com website."
			}
		]
	},
	{
		"role": "assistant",
		"content": null,
		"tool_calls": [
			{
				"id": "call_MsDL8L4dUgJwxW4UKMW9XK6c",
				"type": "function",
				"function": {
					"name": "navigate-to-website",
					"arguments": "{\"url\":\"https://www.google.com\",\"keywords\":[\"search\",\"Google\"],\"searchDescription\":\"Access the Google homepage to perform web searches and explore Google's services.\"}"
				}
			}
		]
	},
	{
		"role": "tool",
		"tool_call_id": "call_MsDL8L4dUgJwxW4UKMW9XK6c",
		"content": [
			{
				"type": "text",
				"text": "You have navigated to the website."
			},
			{
				"type": "image_url",
				"image_url": {
					"url": "...==",
					"detail": "high"
				}
			}
		]
	}
]

API Response:

{
    "error": {
        "message": "Invalid 'messages[3]'. Image URLs are only allowed for messages with role 'user', but this message with role 'tool' contains an image URL.",
        "type": "invalid_request_error",
        "param": "messages[3]",
        "code": "invalid_value"
    }
}
3 Likes

Hi,

That does seem to be deliberate, but I will ask the question, won’t be until next Friday though.

4 Likes

Let me know if you got a response from the team.
Thanks!

Hi, yes, I’ve asked the question. Hopefully get an answer soon.

2 Likes

Hi, any response please?

Thanks

2 Likes

Any updates on this @Foxalabs ?

@ntaraujo Did you find any workaround? I’m facing the same issue using Azure API and it affects production :frowning:

Hi,

I’ve passed the message along. I’ll report back if/when I hear anything.

2 Likes

This affects my work-case too. Any update ? Is there a case number that I can respond to to avoid duplication.

I noticed though that whilst the gt-4-turbo model does not fail the description of the image is not correct.

I guess the work around would be to add a user message with the image base64 but it duplicates the image and greatly increases the number of input tokens.

I was able to have pretty good success with GPT-4 seeing the image as part of a Tool response, but it was pretty hard to make it pay attention to the image if it was part of a system message.

1 Like

I tried an approach using 2 functions.
get_image() => str: A function that returns the url (path) of an image
describe_image(url: str) =>str: A function that given a path of an image returns the description.

Inside the describe_image function there is an openapi call to describe the image using a user role.

The flow is user asks the ai for an image, the ai calls the get image function and the function returns a url, the ai responds to the get_image function and calls the describe image function passing the url from the get_image function. The describe function returns the description of the image and the ai describes the description to the user.

Here is the code

import base64
import json
import os

from dotenv import load_dotenv
from openai import OpenAI

IMAGE_URL = "/tmp/my_image.png"  # Add your own image path Here
# OPENAI_MODEL = "gpt-4o"
OPENAI_MODEL = "gpt-4o-mini"

load_dotenv()


def get_image():
    return f"Here is the url of the image {IMAGE_URL}"


def describe_image(url: str):
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    base64_image = encode_image_to_base64(url)

    response = client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe the image"},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                    },
                ],
            }
        ],
        max_tokens=300,
    )

    return response.choices[0].message.content


def encode_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


FUNCTION_REPOSITORY = {"get_image": get_image, "describe_image": describe_image}


def process_chat_history(chat_history):
    model = OPENAI_MODEL
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    # Define the function that can be called by the AI
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_image",
                "description": "Gets a random image and returns it url.",
                "parameters": {},
            },
        },
        {
            "type": "function",
            "function": {
                "name": "describe_image",
                "description": "Describes the content of an image given its URL",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "url": {"type": "string", "description": "The URL of the image to describe"}
                    },
                    "required": ["url"],
                },
            },
        },
    ]

    try:
        while True:
            print("OPENAI API Call")
            response = client.chat.completions.create(
                model=model, messages=chat_history, tools=tools, tool_choice="auto", max_tokens=300
            )
            print(f"Finish Reason [{response.choices[0].finish_reason}]")

            if response.choices[0].finish_reason != "tool_calls":
                print(f"Assistants final Response: [{response.choices[0].message.content}]")
                chat_history.append(
                    {"role": "assistant", "content": response.choices[0].message.content}
                )
                break

            for tool_call in response.choices[0].message.tool_calls:
                function_name = tool_call.function.name
                function_args = json.loads(tool_call.function.arguments)
                print(f"Running function {function_name} with arguments {function_args}.")

                # Execute the function using dynamic lookup
                if function_name in FUNCTION_REPOSITORY:
                    function = FUNCTION_REPOSITORY[function_name]
                    function_response = function(**function_args) if function_args else function()
                else:
                    function_response = f"Unknown function: {function_name}"

                print(f"Function {function_name}. Repsonse [{function_response}]")

                # Add assistant's message with tool call
                chat_history.append(
                    {
                        "role": "assistant",
                        "content": response.choices[0].message.content,
                        "tool_calls": [
                            {
                                "id": tool_call.id,
                                "type": "function",
                                "function": {
                                    "name": function_name,
                                    "arguments": tool_call.function.arguments,
                                },
                            }
                        ],
                    }
                )

                # Add function response
                chat_history.append(
                    {
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "name": function_name,
                        "content": function_response,
                    }
                )

    except Exception as e:
        return f"An error occurred: {str(e)}"


# Example usage
if __name__ == "__main__":
    messages = [{"role": "user", "content": "Get me an image and describe it."}]

    try:
        result = process_chat_history(messages)
        # print("Final Chat History Dump")
        # print(json.dumps(messages, indent=2))

    except Exception as e:
        print(f"An error occurred: {str(e)}")

Has there been any update regarding this regression? Tools responding back with multi-media seems like a quite obvious use case.

This used to work on previous GPT4 models for quite a while but it has stopped working while other LLM providers (Claude and Gemini) do support it.