I almost took a whole day to debug the same issue that I have been facing. The solution is simple , please use the model “gpt-4o-realtime-preview-2024-12-17”. I was using the other model that is “gpt-4o-realtime-preview-2024-10-01” that gave me this error message “Semantic VAD is not supported for model gpt-4o-realtime-preview-2024-10-01”.
Please use “gpt-4o-realtime-preview-2024-12-17” and transcription will work for you