Hello guys, I want to create a platform to convert video to text, which APIs do I need to use?
Thank you in advance for your answers.
Hello guys, I want to create a platform to convert video to text, which APIs do I need to use?
Thank you in advance for your answers.
Welcome to the OpenAI Dev Community!
OpenAI actually have a cookbook entry (a well-written guide) on how to process videos and create a voiceover. You can find it here.
Some more good documentation for your desired use-case can be found in the API documentation, specifically for GPT-4V and Whisper.
The vision model can see the video, and you can use Whisper to transcribe any audio.