GPT-4 API and image input

Hi there,

Is there a documented way to supply GPT-4 API with images?

I couldn’t find anything in OpenAI’s website.


Looks like receiving image inputs will come out at a later time. This is what it said on OpenAI’s document page:
" GPT-4 is a large multimodal model (accepting text inputs and emitting text outputs today, with image inputs coming in the future) that can solve difficult problems with greater accuracy than any of our previous models, thanks to its broader general knowledge and advanced reasoning capabilities. Like gpt-3.5-turbo , GPT-4 is optimized for chat but works well for traditional completions tasks."


The GPT-4 is on the GPT 3.5 platform, where users who use the Plus version got access to the GPT-4 almost immediately after the launch. But what about those who have API access? I just got mine, and I’m thinking of creating an API for Discord to try the “same” test that was done in the presence of GPT-4 by OpenAI. Does anyone know anything about it?

1 Like

I swear that I used “Describe this image to me” and then pasted the URL of an image, and GPT-4 described the image perfectly to me earlier this morning. Way better than I thought it would have. Then I tried again for a long time and couldn’t get it to work again. Describing images back to me is the main thing I want to do.

1 Like

It was inferring the image contents from the URL.

1 Like

For now you can look at Visual GPT:

It hooks into a 3rd party image interpreter. It can work for some things, but I am assuming GPT-4’s image recognition will be far more in depth.


That is partly true. On that initial test, I uploaded a random image that I found on Google, and it happened to be a tree in front of a sunset. When I asked GPT to describe the image, it described a sunset and a tree, and I assumed it actually worked.

Then when I tried again later, I got a mixture of three responses. One - It would tell me that it can’t look at images. Two - It would guess what the image was based on the URL. Three - it described a sunset and a tree.

For whatever reason, no matter what image I linked to, it would describe it as a sunset and a tree. It was a random coincidence that I uploaded a picture of a sunset and a tree, which appeared to work.

1 Like

You should consider the possibility that this is such a great desire for you that you daydreamed it :wink:

have you seen the movie Contact? :smiley: a beach with a tree… :smiley: maybe AI only sees a sunset and a tree in any dataset.

1 Like

I don’t understand. Where did you upload it? is there any online platform?

This person from four months ago likely just provided the AI a web link, and then as he describes, either the AI was able to recognize a well-known link from its knowledge base, or it was able to hallucinate on the contents of /cute/kitten_playing.gif from the title.

In fact, one can discover this by reading:

You could give the AI a fake URL from a news site, but replace the URL’s description with your own words, and it would “summarize” a whole story just based on the contents of the link.

Computer vision still is not released.

If you have access to ChatGPT code interpreter, you can now upload files there to the python environment, to be manipulated by python commands.

I found AIHub by Instabase which takes documents and allows ChatGPT to read them.

1 Like

It’s just made available yesterday. Yipee!

1 Like

It is not generally available for API users or at least not documented?


any one have figured out how to supply an image to API call and ask question from it? the chat gpt is able to do it now

1 Like

I’ve been trying to figure out how to supply an image to the API but I haven’t got it working yet. Even the following does not work:

response = openai.ChatCompletion.create(
{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: f"what is this image {img_url}"}

output: “I’m sorry for the inconvenience. As a text-based AI, I don’t have the ability to view or interpret images. You may want to use an image search engine or AI that specializes in image recognition for assistance.”

Hopefully someone can figure it out.

1 Like

You can stop wasting your time asking and looking. There is no date of API availability

Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.

When the API starts taking lists instead of strings as “content”, then one might conclude something is going on. When you get the response “Sorry, I can’t help with that”, then you know it’s working as designed.

1 Like

Amazing! But I was wondering what the syntax is for uploading png files as context. Is there a place where they outline the documentation?

This is not possible right now. You have to wait, then the documentation will reflect how to do it.

1 Like