When will API support image/audio as input and output?

Question as title.

Official web ChatGPT has supported the function for a while already. I want to know when will be available via API. Or is there any schedule to open it?

There is no timeline currently for image ingestion with the API, look out for announcements on the OpenAI website blog and social media https://x.com/OpenAI?s=20.

Audio input already exists and is called Whisper