GPT-4 API multimodal access (images)

I can’t find much about the multimodal capabilities of GPT-4. I have access to the “gpt-4” model via the API, but I don’t think it can ingest images. Is the multimodal model different, and if so when might it be available? Or is “gpt-4” multimodal and I just can’t find any documentation on that aspect.