Allowing Images in Non-User Messages

It looks like only user messages are allowed to include images - not tool or system messages.

But actually it would be very helpful to be able to include images in tool outputs or system messages.

For example, imagine a tool that searches the Unsplash API for images. It would be handy if model could see those images.

And there are many situations where I’d like to insert images into the system message to help the model visualize things.

Are there any plans to allow images in system and tool messages? This would be super helpful and seems straightforward and safe!

Background on Typescript Types

(Proof that I did my homework.)

If you look at the Typescript types for system, assistant, and tool messages (ChatCompletionToolMessageParam, ChatCompletionSystemMessageParam, and ChatCompletionAssistantMessageParam) it looks like they all define content as a simple string, instead of the ChatCompletionContentPart type.

5 Likes

Hmm.

Recent developments indicate that things might only get harder from here on out. I could be wrong, but I’m observing the following:

  1. people are complaining that assistants won’t allow you to add assistant messages.
  2. the chat playground no longer lets you change the role of a message.
  3. as you note, assistant and system messages don’t seem to support images as per the docs (https://platform.openai.com/docs/api-reference/chat/create)

While some of this could be nothing, coincidental, or just lazy implementation, if I put my tinfoil hat on, it seems like we’re observing a (very concerning) trend here…

You sure?

my mistake. I didn’t realize that was a button. I just noticed that you can’t change extant message roles anymore

2 Likes

It was useful to change an assistant message to user, and continue with only the edited AI’s answer as your own input for quick token savings.

No more.

I need to hit my own playground UI from a year ago with some practical instead of exploitative updates to make it my new screenshot source. It does have a role toggle.

(Noticed that Google AI studio had the same per-message button layout idea as I wrote in May 2023.)


image

2 Likes


image

These design decisions are contrived, and make sense only if either:

  1. explicitly disallowing images for assistant and system messages is a deliberate design decision
  2. OpenAI doesn’t have a single capable software architect.
  3. communication at OpenAI is so dysfunctional that the UI dev(s [I wouldn’t be surprised by neither it being a single person nor there twelve people working on the UI]) need to bend over backwards to accomodate these weird API specs.

now this seems a bit better. if it was draggable it would be sensible. (have we lost drag drop technology during the pandemic? it used to be everywhere)

the per message rerun feature is a nice touch though.

Disallowing images is a deliberate design decision.

It used to be just “images cannot be sent in the first system message” if I recall correctly, but now you get blocked. Assistants lets assistant messages have images for their own internal purposes, just another case where you don’t get ChatGPT functionality on chat completions, and only can do that replay of an AI seeing something (like it produced it) if you want to play in a ChatGPT-like jail.

Just to confirm, this also does return an error when you do it in the API directly:

Invalid ‘messages[0]’. Image URLs are only allowed for messages with role ‘user’, but this message with role ‘system’ contains an image URL.

So it’s not just a Typescript thing.

Facing the same issue. My use case is that I am making the LLM call a tool which takes a screenshot and then passing the output image to the LLM, so the role of the message is tool and the content is an image, I am getting the same error

Invalid ‘messages[1]’. Image URLs are only allowed for messages with role ‘user’, but this message with role ‘tool’ contains an image URL.

1 Like

Not happy with this restriction at all!

For example, if you get your assistant to create an image and share it with you, then ask the assistant something about the image, this is an instant fail!

I have situations where I’m automating the sharing of picture media via the same local assistant account, which again causes the completion request to fall over! Very silly!!

Please lift this restriction asap, it doesn’t make sense!

Same trouble here. It would be nice to be able to call the DALL-E model to generate image from all kinds of roles.

Here’s a pretty simple reason why it is blocked now:

If you show a conversation history where a multimodal “assistant” is producing images, then you will activate that underlying image generation skill.

OpenAI doesn’t want unhandled tokens without a deconvolution network breaking their API output, nor do they want you receiving an unreleased image generation feature nor having the AI generate for merely completion tokens being billed.

It still doesn’t make complete sense, despite OpenAI disliking developers using AI models to the fullest. A tool return doesn’t simulate an assistant response, and can pass an understanding of what image was generated by AI (come up with a reason why they want to break iteration), or what image was returned by an API (come up with a reason why they want to break automated image understanding).

1 Like