Inputting an image in the Assistant API using the new vision model

Hello,

Two days ago, the new “gpt-4-turbo-2024-04-09” model was released, finally allowing vision capabilities to work alongside function calling.

I originally assumed that this change would allow for the Assistants API to be able to retrieve images - especially because the original disclaimer that the Assistants API doesn’t have vision capabilities seems to be gone. (Unless I’m misremembering?). And also because, well, the Assistants API has access to this model.

But no matter how hard I try, I can’t seem to properly input an image to the assistant, and there is still no documentation on this in the API reference.

Is this a planned feature that should roll out soon, or has it been implemented but not documented yet somehow?

1 Like

At the top or the sidebar of this forum:

  • Click “documentation”;
  • In the documentation sidebar, click “Vision”;
  • Read passages such as this:

GPT-4 Turbo with Vision allows the model to take in images and answer questions about them. … Previously, the model has sometimes been referred to as GPT-4V or gpt-4-vision-preview in the API.

  • Proceed to reading about how a user message with an image is placed.

I used another account using API, yes it’s very new. I suspect this feature is the reason why my chatgpt account access to GPT 4 was delisted -_-

Never mind then, it seems that the disclaimer is still there, I was simply not looking in the right place:

Plese note that the Assistants API does not currently support image inputs.

Funny typo too!

Hopefully that gets added in soon.

EDIT: It got added in!

2 Likes

when did they add vision support for assistants? was this in the april announcement or when gpt4o was released?

There was no big announcement or even changelog. Vision as part of an Assistants’ message was committed to OpenAPI spec May 9.

I am able to add images to the assistant using the playground but I am unble to do so with API.

This is not true, it’s absolutely doable - I’ve done it myself in my own personal API integration.

IIRC, you need to do an API call to upload the image as a file with “vision” as its purpose, then import it into the conversation by sending it as a message attachment.

You are right! It does support @Adam_Zacharia_Anil check this link out.

Also do any of you how to make the API interactive- after it gives a results I want askaks based on that how do I do that

the current code I used to input images and messages is

message_content = [
{
“type”: “text”,
“text”: “What is that”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “”,
“detail”: “high”
}
}
]

message = client.beta.threads.messages.create(
thread_id=thread_id,
role=“user”,
content=message_content
)

and then use the code they give to strem it , which is

class MyEventHandler(AssistantEventHandler):
def init(self):
super().init()
self.outputs =

@override
def on_text_created(self, text) -> None:
    print(f"\nassistant > {text}", end="", flush=True)
    self.outputs.append(text)
  
@override
def on_text_delta(self, delta, snapshot):
    print(delta.value, end="", flush=True)
    self.outputs.append(delta.value)
  
def on_tool_call_created(self, tool_call):
    print(f"\nassistant > {tool_call.type}\n", flush=True)

def on_tool_call_delta(self, delta, snapshot):
    if delta.type == 'code_interpreter':
        if delta.code_interpreter.input:
            print(delta.code_interpreter.input, end="", flush=True)
            self.outputs.append(delta.code_interpreter.input)
        if delta.code_interpreter.outputs:
            print(f"\n\noutput >", flush=True)
            for output in delta.code_interpreter.outputs:
                if output.type == "logs":
                    print(f"\n{output.logs}", flush=True)
                    self.outputs.append(output.logs)

Instantiate the handler

handler = MyEventHandler()

Run the stream with the handler

with client.beta.threads.runs.stream(
thread_id=thread_id,
assistant_id=assistant_id,
event_handler=handler,
) as stream:
stream.until_done()

Now how do I agin asks questions based on the result