Does GPT-4o API Natively Support Video Input like Gemini 1.5?

I checked the cookbook example and doc. It seems that vision-wise it only supports input which is an array of frames, this means no support for the audio.

Any idea how long before we get access to it?