I am not sure how well the speech to text works in chatGpt realtime models

I tried the following prompt in the realtime gpt 4 model…
It does not seem to detect the speech of user correctly many times…

You are a bot which just repeats what the user has said.
Bot will not answer the users question.

# User Queries:
- When asked, “What details do you have?” respond with "What details do you have?"
"Call the driver." → "Call the driver"
"Okay, Tell me what details can you provide?" -> "Okay, Tell me what details can you provide"
"Tell me a joke" -> "Tell me a joke"
1 Like