Can Assistants API understand image files uploaded?

I tried to send a screenshot to the agent and the agent responded with “It appears that there has been an issue with accessing the uploaded screenshot”.

Is this by design or am I missing something?

3 Likes

The Vision docs say, at https://platform.openai.com/docs/guides/vision

Note that the Assistants API does not currently support image inputs.

Elsewhere there was a note that Retrieval could be used for it, but at least in the listing none of the image formats are supported for Retrieval. I tried it anyway yesterday and nothing worked.

So I’m now back to just using the vision model without assistants, keeping the thread in my backend and re-sending the messages for continued discussion, which I guess is what the Assistants API does internally anyway.

2 Likes

I’m having the same issue trying to build an assistant capable of handling Images as inputs. Very disappointing that the assistants aren’t currently capable of handling images as inputs.

1 Like

You can build a function that makes image recognition (from any service) useful.

Let’s say you have MiniGPT running on your server to label areas of an image with contents.

Then you just need to provide your normal chat completions AI chatbot you programmed with a function specification to call that. Obviously you can’t send an actual image, but a user could supply a URL via chat or by your webpage dialog that directly interfaces with the function.

2 Likes

Yeah, I ended up using Vision API through Chat completions API for my use case.

So the user would basically upload the image on the frontend, the frontend would send the image to the backend, I would then upload it to my server, pass the URL to the Assistant, which in turn passes the user message and the URL to the Chat API through function calling. Chat API then describes the image using Vision, returning the response to the Assistant, which returns it back to the user.

If anyone is interested I wrote a short blog post on how that can be done in Laravel but the general logic is applicable in any language:

Using Laravel to interact with Assistants API and Vision

2 Likes

Please share the code. I have been trying to do the same using chat completions but it doesn’t support thread id ig.

1 Like

In chat completions, you can simply include an image as part of the user input, only when using the AI model gpt-4-vision-preview (which you can switch to just when an image to analyze has been furnished.)

Conversation history, you manage yourself by giving the AI some of the past chat exchanges before the latest user input.

You can read more about vision use at https://platform.openai.com/docs/guides/vision, already posted, or by continuing to expand the API Reference for chat completion messages into user messages.

1 Like

Why do I have to provide some chat content? Can’t we have a continuous discussion on the same ID?

2 Likes

Suppose I sent an image input for description. I got an ID in response. Can I use that id to continue that discussion and then I don’t need to provide past chat content to make it understand previous history, just like we do using ChatGPT?

1 Like

The chat completions endpoint does not maintain threads or conversations, and the id that is returned is just used internally by OpenAI. It is stateless, memoryless, and you get direct access to the AI model, which is loaded with all the input text by you that makes it generate an answer each time.

2 Likes

Sadly a year later still have the same issue. It can upload the image, but it’s not analyzing the image.

1 Like

Hi,

First of all, these posts helped me a lot as I was also struggling to get Image analyzed with Assistant APIs.

This is what worked for me (Python Code);

#function to upload a file for image analysis
def upload_file_to_openai(client, file_path, purpose):
    try:
        with open(file_path, "rb") as file:
            response = client.files.create(
                file=file,
                purpose="vision"
            )
        return response
    except Exception as e:
        print(f"An error occurred while uploading the file: {str(e)}")
        return None

#create message
def add_message_to_thread(client, api_option, model, thread_id, content):
    try:
        if api_option == "completions":
            return client.chat.completions.create(
                model=model, #gpt-4o-mini
                messages=[
                    {
                    "role": "user",
                    "content": [{"type": "text", "text": "Whats in this image?"},
                                {"type": "image_url","image_url": 
                                 { "url": "https://", "detail": "high"},},],}])
        
        if api_option == "assistant":
            return client.beta.threads.messages.create(
                thread_id = thread_id,
                role = "user",
                content = [{"type": "text", "text":content},
                            {"type": "image_file", 
                            "image_file": {"file_id": "file-mnnb6RFsABCo0vBLae7c2rtQ", "detail": "low"}}])
        
        # manage wrong api option
    except Exception as e:
        print(f"Error adding message to thread: {e}")
        return None

Hope this helps!

Thanks

1 Like