Developing a chat bot with real time voice to voice using the new WebRTC.
When using voice to voice, which is the parameter that returns your transcribed text after your turn when you speak? it says the model defaults to whisper-1, when I try to access conversation.item.create or input_audio_transcription.completed it’s returning null in the transcript values
https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/create
or
In my logs you can see the transcript parameter is null for conversation.item.create
…
1. {type: 'conversation.item.created', event_id: 'event_AgD4grkMnnQAX5meG0uSX', previous_item_id: null, item: {…}}
1. event_id: "event_AgD4grkMnnQAX5meG0uSX"
2. item:
1. content: Array(1)
1. 0: {type: 'input_audio', transcript: null}
2. length: 1
3. [[Prototype]]: Array(0)
2. id: "item_AgD4gCkEkQZuilVoJrVs2"
3. object: "realtime.item"
4. role: "user"
5. status: "completed"
6. type: "message"
7. [[Prototype]]: Object
3. previous_item_id: null
4. type: "conversation.item.created"
5. [[Prototype]]: Object
But in the docs
"event_id": "event_1920",
"type": "conversation.item.created",
"previous_item_id": "msg_002",
"item": {
"id": "msg_003",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "user",
"content": [
{
"type": "input_audio",
"transcript": "hello how are you",
"audio": "base64encodedaudio=="
}
]
}
}
The transcript
parameter is filled.
Does anyone have any idea why this value is coming back null?