GPT-4 API multimodal access (images)

I can’t find much about the multimodal capabilities of GPT-4. I have access to the “gpt-4” model via the API, but I don’t think it can ingest images. Is the multimodal model different, and if so when might it be available? Or is “gpt-4” multimodal and I just can’t find any documentation on that aspect.

Considering the recent update: ChatGPT can now see, hear, and speak, where saying: “Voice is coming on iOS and Android (opt-in in your settings) and images will be available on all platforms.”
Have anyone figured out how to use this multi-modal capability from API? I haven’t see any update related to this in the API documentations.

1 Like

Follow this thread as well. Normally API should be released to dev community before official service release.