Gpt4-o Support for Image URLS as tool responses

pabloloz · August 14, 2024, 10:25pm

The previous Gpt4-turbo model supported image urls being part of a tool response which was useful for tools that responded back images to the models as a result of the tool call.

The new gpt4o models have apparently lost this functionality and can only receive images from messages with user role.

Is there a way to workaround this regression or is there any plan to fix the issue?

Example request:

[
	{
		"role": "user",
		"content": [
			{
				"type": "text",
				"text": "Navigate to google.com website."
			}
		]
	},
	{
		"role": "assistant",
		"content": null,
		"tool_calls": [
			{
				"id": "call_MsDL8L4dUgJwxW4UKMW9XK6c",
				"type": "function",
				"function": {
					"name": "navigate-to-website",
					"arguments": "{\"url\":\"https://www.google.com\",\"keywords\":[\"search\",\"Google\"],\"searchDescription\":\"Access the Google homepage to perform web searches and explore Google's services.\"}"
				}
			}
		]
	},
	{
		"role": "tool",
		"tool_call_id": "call_MsDL8L4dUgJwxW4UKMW9XK6c",
		"content": [
			{
				"type": "text",
				"text": "You have navigated to the website."
			},
			{
				"type": "image_url",
				"image_url": {
					"url": "data:image/png;base64,iVBORw0KGgoAA...==",
					"detail": "high"
				}
			}
		]
	}
]

API Response:

{
    "error": {
        "message": "Invalid 'messages[3]'. Image URLs are only allowed for messages with role 'user', but this message with role 'tool' contains an image URL.",
        "type": "invalid_request_error",
        "param": "messages[3]",
        "code": "invalid_value"
    }
}

Foxalabs · August 14, 2024, 11:08pm

Hi,

That does seem to be deliberate, but I will ask the question, won’t be until next Friday though.

pabloloz · August 23, 2024, 9:42pm

Let me know if you got a response from the team.
Thanks!

Foxalabs · August 24, 2024, 6:19am

Hi, yes, I’ve asked the question. Hopefully get an answer soon.

ntaraujo · September 4, 2024, 9:55pm

Hi, any response please?

Thanks

tediashvili · September 22, 2024, 8:22pm

Any updates on this @Foxalabs ?

@ntaraujo Did you find any workaround? I’m facing the same issue using Azure API and it affects production

Foxalabs · September 22, 2024, 8:53pm

Hi,

I’ve passed the message along. I’ll report back if/when I hear anything.

hiddendeveloper · October 8, 2024, 11:35pm

This affects my work-case too. Any update ? Is there a case number that I can respond to to avoid duplication.

I noticed though that whilst the gt-4-turbo model does not fail the description of the image is not correct.

I guess the work around would be to add a user message with the image base64 but it duplicates the image and greatly increases the number of input tokens.

pabloloz · October 9, 2024, 12:33am

I was able to have pretty good success with GPT-4 seeing the image as part of a Tool response, but it was pretty hard to make it pay attention to the image if it was part of a system message.

hiddendeveloper · October 9, 2024, 9:56am

I tried an approach using 2 functions.
get_image() => str: A function that returns the url (path) of an image
describe_image(url: str) =>str: A function that given a path of an image returns the description.

Inside the describe_image function there is an openapi call to describe the image using a user role.

The flow is user asks the ai for an image, the ai calls the get image function and the function returns a url, the ai responds to the get_image function and calls the describe image function passing the url from the get_image function. The describe function returns the description of the image and the ai describes the description to the user.

Here is the code

import base64
import json
import os

from dotenv import load_dotenv
from openai import OpenAI

IMAGE_URL = "/tmp/my_image.png"  # Add your own image path Here
# OPENAI_MODEL = "gpt-4o"
OPENAI_MODEL = "gpt-4o-mini"

load_dotenv()


def get_image():
    return f"Here is the url of the image {IMAGE_URL}"


def describe_image(url: str):
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    base64_image = encode_image_to_base64(url)

    response = client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe the image"},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                    },
                ],
            }
        ],
        max_tokens=300,
    )

    return response.choices[0].message.content


def encode_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


FUNCTION_REPOSITORY = {"get_image": get_image, "describe_image": describe_image}


def process_chat_history(chat_history):
    model = OPENAI_MODEL
    client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

    # Define the function that can be called by the AI
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_image",
                "description": "Gets a random image and returns it url.",
                "parameters": {},
            },
        },
        {
            "type": "function",
            "function": {
                "name": "describe_image",
                "description": "Describes the content of an image given its URL",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "url": {"type": "string", "description": "The URL of the image to describe"}
                    },
                    "required": ["url"],
                },
            },
        },
    ]

    try:
        while True:
            print("OPENAI API Call")
            response = client.chat.completions.create(
                model=model, messages=chat_history, tools=tools, tool_choice="auto", max_tokens=300
            )
            print(f"Finish Reason [{response.choices[0].finish_reason}]")

            if response.choices[0].finish_reason != "tool_calls":
                print(f"Assistants final Response: [{response.choices[0].message.content}]")
                chat_history.append(
                    {"role": "assistant", "content": response.choices[0].message.content}
                )
                break

            for tool_call in response.choices[0].message.tool_calls:
                function_name = tool_call.function.name
                function_args = json.loads(tool_call.function.arguments)
                print(f"Running function {function_name} with arguments {function_args}.")

                # Execute the function using dynamic lookup
                if function_name in FUNCTION_REPOSITORY:
                    function = FUNCTION_REPOSITORY[function_name]
                    function_response = function(**function_args) if function_args else function()
                else:
                    function_response = f"Unknown function: {function_name}"

                print(f"Function {function_name}. Repsonse [{function_response}]")

                # Add assistant's message with tool call
                chat_history.append(
                    {
                        "role": "assistant",
                        "content": response.choices[0].message.content,
                        "tool_calls": [
                            {
                                "id": tool_call.id,
                                "type": "function",
                                "function": {
                                    "name": function_name,
                                    "arguments": tool_call.function.arguments,
                                },
                            }
                        ],
                    }
                )

                # Add function response
                chat_history.append(
                    {
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "name": function_name,
                        "content": function_response,
                    }
                )

    except Exception as e:
        return f"An error occurred: {str(e)}"


# Example usage
if __name__ == "__main__":
    messages = [{"role": "user", "content": "Get me an image and describe it."}]

    try:
        result = process_chat_history(messages)
        # print("Final Chat History Dump")
        # print(json.dumps(messages, indent=2))

    except Exception as e:
        print(f"An error occurred: {str(e)}")

pabloloz · December 6, 2024, 8:05pm

Has there been any update regarding this regression? Tools responding back with multi-media seems like a quite obvious use case.

This used to work on previous GPT4 models for quite a while but it has stopped working while other LLM providers (Claude and Gemini) do support it.

Topic		Replies	Views
Gpt-4o Regression, doesn't support images in System and Function messages Feedback gpt-4	5	1098	September 22, 2024
Allowing Images in Non-User Messages Feedback api	11	670	November 19, 2024
Api not able to read images from any url API gpt-4 , gpt-4-vision , assistants-api	7	2162	October 23, 2024
Returning image as result of function call to gpt-4-turbo Bugs	11	3139	November 4, 2024
Cannot send image_url to gpt-4o API	4	3167	October 2, 2024

Gpt4-o Support for Image URLS as tool responses

Related topics