Translation: How to output simultaneously with input?

Hello. It always demotivated me that I have poor English speaking while developing a startup in England! So my idea is for voice translation to be just realtime with 2 second delay, not waiting for anybody to finish talking. A mobile app will be amazing with this feature to just handle the phone calls! But it seems the OpenAI realtime API is not very convinient for this use case. Please give me some suggestions why it cant be done, so that I can probably figure out how to do it!