Can Assistants API understand image files uploaded?

I tried to send a screenshot to the agent and the agent responded with “It appears that there has been an issue with accessing the uploaded screenshot”.

Is this by design or am I missing something?

2 Likes

The Vision docs say, at https://platform.openai.com/docs/guides/vision

Note that the Assistants API does not currently support image inputs.

Elsewhere there was a note that Retrieval could be used for it, but at least in the listing none of the image formats are supported for Retrieval. I tried it anyway yesterday and nothing worked.

So I’m now back to just using the vision model without assistants, keeping the thread in my backend and re-sending the messages for continued discussion, which I guess is what the Assistants API does internally anyway.

1 Like

I’m having the same issue trying to build an assistant capable of handling Images as inputs. Very disappointing that the assistants aren’t currently capable of handling images as inputs.

You can build a function that makes image recognition (from any service) useful.

Let’s say you have MiniGPT running on your server to label areas of an image with contents.

Then you just need to provide your normal chat completions AI chatbot you programmed with a function specification to call that. Obviously you can’t send an actual image, but a user could supply a URL via chat or by your webpage dialog that directly interfaces with the function.

1 Like

Yeah, I ended up using Vision API through Chat completions API for my use case.

So the user would basically upload the image on the frontend, the frontend would send the image to the backend, I would then upload it to my server, pass the URL to the Assistant, which in turn passes the user message and the URL to the Chat API through function calling. Chat API then describes the image using Vision, returning the response to the Assistant, which returns it back to the user.

If anyone is interested I wrote a short blog post on how that can be done in Laravel but the general logic is applicable in any language:

Using Laravel to interact with Assistants API and Vision

1 Like

Please share the code. I have been trying to do the same using chat completions but it doesn’t support thread id ig.

In chat completions, you can simply include an image as part of the user input, only when using the AI model gpt-4-vision-preview (which you can switch to just when an image to analyze has been furnished.)

Conversation history, you manage yourself by giving the AI some of the past chat exchanges before the latest user input.

You can read more about vision use at https://platform.openai.com/docs/guides/vision, already posted, or by continuing to expand the API Reference for chat completion messages into user messages.

Why do I have to provide some chat content? Can’t we have a continuous discussion on the same ID?

1 Like

Suppose I sent an image input for description. I got an ID in response. Can I use that id to continue that discussion and then I don’t need to provide past chat content to make it understand previous history, just like we do using ChatGPT?

The chat completions endpoint does not maintain threads or conversations, and the id that is returned is just used internally by OpenAI. It is stateless, memoryless, and you get direct access to the AI model, which is loaded with all the input text by you that makes it generate an answer each time.

1 Like