Inputting an image in the Assistant API using the new vision model

turbolucius · April 11, 2024, 2:18pm

Hello,

Two days ago, the new “gpt-4-turbo-2024-04-09” model was released, finally allowing vision capabilities to work alongside function calling.

I originally assumed that this change would allow for the Assistants API to be able to retrieve images - especially because the original disclaimer that the Assistants API doesn’t have vision capabilities seems to be gone. (Unless I’m misremembering?). And also because, well, the Assistants API has access to this model.

But no matter how hard I try, I can’t seem to properly input an image to the assistant, and there is still no documentation on this in the API reference.

Is this a planned feature that should roll out soon, or has it been implemented but not documented yet somehow?

_j · April 11, 2024, 3:59pm

At the top or the sidebar of this forum:

Click “documentation”;
In the documentation sidebar, click “Vision”;
Read passages such as this:

GPT-4 Turbo with Vision allows the model to take in images and answer questions about them. … Previously, the model has sometimes been referred to as GPT-4V or gpt-4-vision-preview in the API.

Proceed to reading about how a user message with an image is placed.

johncain194 · April 11, 2024, 4:01pm

I used another account using API, yes it’s very new. I suspect this feature is the reason why my chatgpt account access to GPT 4 was delisted -_-

turbolucius · April 11, 2024, 4:17pm

Never mind then, it seems that the disclaimer is still there, I was simply not looking in the right place:

Plese note that the Assistants API does not currently support image inputs.

Funny typo too!

Hopefully that gets added in soon.

EDIT: It got added in!

beyonce · May 23, 2024, 2:35am

when did they add vision support for assistants? was this in the april announcement or when gpt4o was released?

_j · May 23, 2024, 2:48am

There was no big announcement or even changelog. Vision as part of an Assistants’ message was committed to OpenAPI spec May 9.

Adam_Zacharia_Anil · July 16, 2024, 6:23pm

I am able to add images to the assistant using the playground but I am unble to do so with API.

turbolucius · July 16, 2024, 7:50pm

This is not true, it’s absolutely doable - I’ve done it myself in my own personal API integration.

IIRC, you need to do an API call to upload the image as a file with “vision” as its purpose, then import it into the conversation by sending it as a message attachment.

Munna23 · July 16, 2024, 8:00pm

You are right! It does support @Adam_Zacharia_Anil check this link out.

Adam_Zacharia_Anil · July 16, 2024, 9:40pm

Also do any of you how to make the API interactive- after it gives a results I want askaks based on that how do I do that

the current code I used to input images and messages is

message_content = [
{
“type”: “text”,
“text”: “What is that”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “”,
“detail”: “high”
}
}
]

message = client.beta.threads.messages.create(
thread_id=thread_id,
role=“user”,
content=message_content
)

and then use the code they give to strem it , which is

class MyEventHandler(AssistantEventHandler):
def init(self):
super().init()
self.outputs =

@override
def on_text_created(self, text) -> None:
    print(f"\nassistant > {text}", end="", flush=True)
    self.outputs.append(text)
  
@override
def on_text_delta(self, delta, snapshot):
    print(delta.value, end="", flush=True)
    self.outputs.append(delta.value)
  
def on_tool_call_created(self, tool_call):
    print(f"\nassistant > {tool_call.type}\n", flush=True)

def on_tool_call_delta(self, delta, snapshot):
    if delta.type == 'code_interpreter':
        if delta.code_interpreter.input:
            print(delta.code_interpreter.input, end="", flush=True)
            self.outputs.append(delta.code_interpreter.input)
        if delta.code_interpreter.outputs:
            print(f"\n\noutput >", flush=True)
            for output in delta.code_interpreter.outputs:
                if output.type == "logs":
                    print(f"\n{output.logs}", flush=True)
                    self.outputs.append(output.logs)

Instantiate the handler

handler = MyEventHandler()

Run the stream with the handler

with client.beta.threads.runs.stream(
thread_id=thread_id,
assistant_id=assistant_id,
event_handler=handler,
) as stream:
stream.until_done()

Now how do I agin asks questions based on the result

Topic		Replies	Views
Can Assistants API understand image files uploaded? API	11	11366	September 28, 2024
Using the assistance / chat completion API to ask about an image attachment? API api , image-reading , chat-with-images	5	5481	December 17, 2023
Integrating Vision with Assistant API API assistants-api	11	3343	May 16, 2024
Sending Images to New gpt-4-turbo via the assistants API? API gpt-4 , chatgpt , api	2	2623	May 3, 2024
Image inputs in the GPT-4 API API gpt-4	13	25431	February 6, 2024

Inputting an image in the Assistant API using the new vision model

Instantiate the handler

Run the stream with the handler

Related topics