GPT-4 API multimodal access (images)

I can’t find much about the multimodal capabilities of GPT-4. I have access to the “gpt-4” model via the API, but I don’t think it can ingest images. Is the multimodal model different, and if so when might it be available? Or is “gpt-4” multimodal and I just can’t find any documentation on that aspect.

4 Likes

Considering the recent update: ChatGPT can now see, hear, and speak, where saying: “Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.”
Have anyone figured out how to use this multi-modal capability from API? I haven’t see any update related to this in the API documentations.

3 Likes

Follow this thread as well. Normally API should be released to dev community before official service release.

2 Likes

I’m also curious if it is possible to use the API to provide an image and get a summary. I want to use this for a project.

Same here. It works amazingly through the browser, we need API access :frowning:

Any news on this? It has been announced on the DevDay but I don’t see anything related to it in the API docs. Am I missing something?

GPT4-Vision is available only through the API, for now. With API I mean code. It’s not in the playground yet.

Also, if you don’t have access to it even through API code, top up your account with at least $0.50 and it should unlock GPT-4 and probably Vision, too.