Using images to discuss with an assistant

I want to add image support to my project. But it is not image generation. I’d like to be able to upload a picture when talking with an assistant and get comments/feedback or discuss about the image.
First I thought I should send the image as a file to a vector store with purpose “assistant”
But I get a 415 (unsupported media type) error when I try to upload an image to the files endpoint (I try as a multipart form)

Then I searched on the documentation and see that “assistants” api does not support all kinds of documents. And on the community forum there are threads that say images are not supported.

On the other hand, here:
I read an example of image upload via base64 encoded format in a “message” post to “completions” api.

is this the correct way ? Can I do the same with “assistants” api ?
if that’s correct, I don’t need to use a vector store, right ?

Hi, you get the second link to this example I wrote 8 hours ago.

Vision is supported in Assistants when using a vision-capable assistants-capable model, and is perceived as part of what a user has sent in a message.

That it is a recurring question about an ability added just weeks ago, without an entry in “change log” or “what’s new”, we see that OpenAI’s documentation could be improved.

1 Like

@ilkeraktuna as @_j said you can pass images to an assistant as long as that assistant has a vision enabled model selected such as GPT-4o. From your link you sent below the Quickstart it states tha basic rule of thumb.

“Images are made available to the model in two main ways: by passing a link to the image or by passing the base64 encoded image directly in the request. Images can be passed in the user , system and assistant messages. Currently we don’t support images in the first system message but this may change in the future.”

1 Like

Specifically with Assistants, though, BASE64 attachment is not an option, though. You have a different method, which is to upload to storage and then use its file ID.

So does the image still get passed at the message level or could you pass it at the thread level too? I know in the playground passing in the message does not create a vector store, but does create a file id which aligns that it goes directly to file storage.

1 Like

really ?
I had thought the easiest method would be BASE64
so is it not supported on assistants ?

But then how to upload it to storage ?
I am using java , so I don’t have the fancy and easy python libraries to use.
I wrote my own http post method using httpurlconnection for json posting. For file attachment I tried Volley library and got 415 “media type unsupported”

Images are not a document that can be used with vector store and its file_search, which is the only messages file method that has semi-permanent attachment to a thread (that is just a container). Images can be attached to a code interpreter session by message for data processing, but I don’t think they are refreshed after that session expires in an hour, either.

So they are part of the message, and continue to be seen in past messages until the chat grows too long. The tokens of images should be preserved whether they came from URL or File ID

1 Like

So then to upload an image you just use this

and then pass the file id during the message creation as an attachment?

The link in my second post shows how a user message is constructed for computer vision, using a file uploaded to file storage endpoint with the purpose set as “vision” (not “assistants”).

Although you can talk about the image like it was an “attachment”, it is not for the API parameter “attachments”, where you must select only {“type”: “code_interpreter”} or {“type”: “file_search”}.

1 Like

Yes this helps. This section also clears up on the library side.

Creating image input content

Message content can contain either external image URLs or File IDs uploaded via the File API. Only models with Vision support can accept image input. Supported image content types include png, jpg, gif, and webp. When creating image files, pass purpose="vision" to allow you to later download and display the input content. Currently, there is a 100GB limit per organization and 10GB for user in organization. Please contact us to request a limit increase.

Tools cannot access image content unless specified. To pass image files to Code Interpreter, add the file ID in the message attachments list to allow the tool to read and analyze the input. Image URLs cannot be downloaded in Code Interpreter today.

ok. so when I am uploading , I have to post to files endpoint
and set purpose parameter as “vision” not “assistants”
where should I set “type”: “file_search” ? (I had done that for the vector store, but here you say I won’t use vector store, so where will this be set ?

Also when posting should I use “Content-Type”, “application/octet-stream” OR “Content-Type”, “application/json” ?

I got the solution from chatGPT. Had to use a okhttp client library but it works…