Real-time voice conversations with GPT-4o photo/video support

Basvd · November 11, 2024, 7:59pm

Hi there! I’ve been working on a small project that shows how you could develop your own web app which supports real-time voice conversations together with GPT-4o for photo/video questions and answers.

During the OpenAI Spring Update there was a similar demo, but afaik this feature is not rolled out yet to the app so i tried to build a similar app.

While testing i became really bullish on this technology’s potential for blind users. Great times ahead for them!

Demo: https://youtu.be/Bh5tORytR90
Code: GitHub - basvandorst/realtime-gpt4o-videochat: Real-time GPT-4o video/photo/voice chat

Open for feedback, but see it more as a demo/PoC project to play with.

PaulBellow · November 11, 2024, 8:02pm

Welcome to the community! Thanks for sharing your project with us.

I’ve added the project tag for you. We ask that you keep updates to this single thread to make it easier for everyone to keep up to date.

Any big issues you ran into while coding it that you can share with the community? Again, welcome!

Basvd · November 11, 2024, 8:57pm

It was nice to play with, some feedback:

Real-time API: Probably high on the roadmap, but frontend only authentication is not that secure
Very limited p/d tokens in the lowest Tier (guess only max 5 minute conversation)
Would be great if the Realtime API Client give you feedback if you reached the limits (afaik not standard?)
nice to have the React app as reference, but also a bit overwhelmed with all Wav libraries and components. Bit hard to find out what was really needed to have the basics up and running
Again, really see the combination (real-time voice+photo/video) as a usefull tool for blind people

Topic		Replies	Views
Web App Voice Interface for GPT-4 Community	6	11073	May 29, 2023
Finally finished my nextJS app with GPT, Dalle, Fast TTS and translation - Would love some community feedback and suggestions Community chatgpt , application , development	0	1557	December 9, 2023
Introducing exofi.app, version 0.205, now includesTTS, DALLE-3, translation. Vision coming soon, come check it out! Community chatgpt , plugin-development	1	602	December 11, 2023
One week with the Realtime API API realtime	3	1737	December 17, 2024
Looking for Feedback for my GPT Powered Project (With Voices) Community chatgpt , api	1	77	July 30, 2024

Real-time voice conversations with GPT-4o photo/video support

Related topics