Sending a picture to GPT 4 vision by URL (assistant)

How can I make it so that I can recognize an image and store information about it in the assistant thread?

1 Like

Hi!

I guess you are asking because the assistant can’t browse the web without the necessary tools. Also, the assistants API does currently not support the vision model.

You would have to supply your assistant with a function/tool to fetch an image from a server and pass it to the vision model before returning the vision model’s response back to the assistant as context. So, all of this can be triggered by the assistant but it’s mostly done without the assistant API.

1 Like

Could you be a little more detailed?
I would like to add to my application the function of understanding what is in the photo, can you tell me which way to think?

Ok.
If you go to the documentation for the vision model there’s a sample code for getting a description from an image (1).
One approach is to trigger this code via a function call from the Assistant.
The assistants function call will then include the link to the image. And the return value will be the image description (2).
You need to set-up a server that the assistant can call.

  1. https://platform.openai.com/docs/guides/vision/quick-start

  2. https://platform.openai.com/docs/assistants/tools/function-calling

Also note that I edited my previous post. It’s possible to simply pass a image URL to the vision model.