Real-time voice conversations with GPT-4o photo/video support

Hi there! I’ve been working on a small project that shows how you could develop your own web app which supports real-time voice conversations together with GPT-4o for photo/video questions and answers.

During the OpenAI Spring Update there was a similar demo, but afaik this feature is not rolled out yet to the app so i tried to build a similar app.

While testing i became really bullish on this technology’s potential for blind users. Great times ahead for them!

Demo: https://youtu.be/Bh5tORytR90
Code: GitHub - basvandorst/realtime-gpt4o-videochat: Real-time GPT-4o video/photo/voice chat

Open for feedback, but see it more as a demo/PoC project to play with.

5 Likes

Welcome to the community! Thanks for sharing your project with us.

I’ve added the project tag for you. We ask that you keep updates to this single thread to make it easier for everyone to keep up to date.

Any big issues you ran into while coding it that you can share with the community? Again, welcome!

2 Likes

It was nice to play with, some feedback:

  • Real-time API: Probably high on the roadmap, but frontend only authentication is not that secure :wink:
  • Very limited p/d tokens in the lowest Tier (guess only max 5 minute conversation)
  • Would be great if the Realtime API Client give you feedback if you reached the limits (afaik not standard?)
  • nice to have the React app as reference, but also a bit overwhelmed with all Wav libraries and components. Bit hard to find out what was really needed to have the basics up and running
  • Again, really see the combination (real-time voice+photo/video) as a usefull tool for blind people :ok_hand:
2 Likes