GPT-4 API multimodal access (images)

I can’t find much about the multimodal capabilities of GPT-4. I have access to the “gpt-4” model via the API, but I don’t think it can ingest images. Is the multimodal model different, and if so when might it be available? Or is “gpt-4” multimodal and I just can’t find any documentation on that aspect.


Considering the recent update: ChatGPT can now see, hear, and speak, where saying: “Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.”
Have anyone figured out how to use this multi-modal capability from API? I haven’t see any update related to this in the API documentations.


Follow this thread as well. Normally API should be released to dev community before official service release.


I’m also curious if it is possible to use the API to provide an image and get a summary. I want to use this for a project.

Same here. It works amazingly through the browser, we need API access :frowning:

Any news on this? It has been announced on the DevDay but I don’t see anything related to it in the API docs. Am I missing something?

GPT4-Vision is available only through the API, for now. With API I mean code. It’s not in the playground yet.

Also, if you don’t have access to it even through API code, top up your account with at least $0.50 and it should unlock GPT-4 and probably Vision, too.