Struggling with Robotic AI responses on OpenAI API

I’ve been using the GPT-4o via api. My app has custom instructions, autogen memory, and vision capabilities. Yet, the responses still sounds super robotic and basic. Keeps on repeating words like do you need something specific? Or i’m here to help type of questions. All the time and It doesn’t seem to matter what I tweak; I can’t remove these robotic responses. And the way he thinks feels also so much limited compared when I’m chatting with the new advanced voice from OpenAI, it’s like night and day. The responses feel so much more natural, almost like it’s actually thinking and not just echoing my own thoughts back at me. Has anyone else experienced this? Is there a way to make the API voice sound less robotic?

Same thing with the new realtime api they just released. It’s so basic the way he responds back. Sure it’s fast but nothing even close to the realism you get when talking to advanced voice.
I’m loosing the will to continue developing anything further because of this limitations i’m getting. I was really hoping i would get this amazing ai because of the vision I installed and web search,memory all this context should make him smarter but I’m getting the opposite. It doesn’t matter when he can’t process all this information intelligently like advanced voice or even gpt normal version on the web. Feels like they hard coded in api to sound robotic and basic with 0 to none thinking, hardcoded with one instruction(repeat thoughts of the user in a way that sounds better but same at the same time). It’s so sad actually.