We are using open AI realtime API for voice interaction as per https://platform.openai.com/docs/guides/realtime?text-generation-quickstart-example=stream and getting the transcription back also along with audio. Is there anyway we can get transcription start and end date time also in response? Right now, it is not returning these details. Thanks
Why not set up your own clock and timestamp the transcription deltas as they are received by your client?
There can be pause and few other scenarios etc and maintaining clock at server side will not be so great. The whisper has so nice transcript datetime stamp added and not sure why it is missing in real time api output.
My personal opinion is that it is still possible, even if you would have to take some edge cases and other factors into account. As sad as it is, I definitely would not rely on OpenAI adding this feature