What is the correct process to replicate the camera streaming shown at the keynote

sbalani · May 29, 2024, 5:24am

I couldn’t find an API endpoint that can accept a video stream. There’s either one for images or video files.

As such if I wanted to create an app that allows the user to interact with ChatGPT multimodally as they showed off how would I do it? Would I open a stream to gpt4o and pass in a frame every x frames ?

supershaneski · May 29, 2024, 7:54am

if i am going to implement that demo with the current limitations, the easiest and simplest way is to trigger image capture when the user either starts talking or after, then send both to the backend, audio being transcribed by whisper and then composed together for vision request format with the captured image. i might also prepare a tool/function that will trigger image capture since user might not all the time wants to talk about the image and only do so when referred to in the conversation.

Topic		Replies	Views
ChatGPT API TTS streaming API api	2	3400	June 1, 2024
Are there Gateway APIs for video? API	2	691	July 29, 2021
Voice to voice via API possible? API gpt-4 , api	1	500	May 27, 2024
Video conversation gpt-4 conversion API API gpt-4	1	275	August 1, 2024
Use Open AI API for video analysys API gpt-4 , api , chatgpt-plugin	3	9017	November 15, 2023

What is the correct process to replicate the camera streaming shown at the keynote

Related topics