Has anyone found a way to stop GPT4o from interrupting the user midsentence?

I’ve been using GPT-4’s interactive mode on Android for self-brainstorming sessions when I’m alone.

Recently, the app has become quite problematic: it lags, the responses print out extremely slowly, and then it starts highlighting words one by one as if pronouncing them.

I assume these are temporary issues, either due to high load or a bug, and they will be fixed.

However, what has been particularly frustrating over the last week or ten days are the constant interruptions. It’s become almost unbearable, and I get triggered by an LLM because it keeps cutting me off mid-sentence. I know I can dictate and use the model that way, but it’s inconvenient when I’m jogging or my hands are busy.

Any solutions/tips/advices are highly appreciated!

P.S.

This question primarily concerns using the interactive mode with GPT-4 on Android/iOS. (I generally avoid using iOS unless it’s jailbroken, as it feels too restrictive for me.)

I’m also about to start testing GPT-4o for potential use via API in hotel management software. This software aims to facilitate easy communication (especially translation) between guests and staff across various accommodations, from one-bedroom guest houses to hotels, mostly in Montenegro. The company is considering adding an option for guests to voice chat with AI. I haven’t started testing this yet, but the current issues are relevant since the owner stated that he wants to use the best available model.

Thanks in advance

By interactive mode do you mean the (voice) conversational mode?

The ChatGPT app is completely unusable if you aren’t in a high-quality area for data. Canada, which can be considered a third-world country when it comes to telecom does NOT work with ChatGPT. The model is reduced to mumbling and cutting off mid-sentence.

You may want to wait for this new fancy GPT-4o app that is coming out… Maybe… Sometime… Ish… If it even works. One thing to note is that they needed to be plugged in to a probably very powerful network through a NETWORK cable (they couldn’t even trust the WiFi) for their demonstration. This was a demo though, so it’s not completely unfair. BUT, with their current tech it’s brutally obvious that this tech will slip and stumble from any less-than-ideal connection & voice quality.

This is worth mentioning as in my experience, especially in tourist areas, the quality of voice, and connection can vary greatly. The last thing you want is a voice-agent that works with <70% of clients. Providing a frustrating, unusable experience for the rest.

There hasn’t been any mentioning of this new model being available for the API either. Gpt-4o is available, but not the lightning fast capabilities that they demonstrated. So you may not even be able to match it.

Just remember the following: OpenAI themself do not even use GPT models for automated customer support. That should indicate their confidence.

I would really consider starting with a text-based service before immediately jumping into voice. There’s a lot of people who are building some powerful voice-based systems.

You do not want to be locked in with OpenAI. They are notorious for hyping themselves up, releasing some amazing technology, and then completely leaving it in the dust to chase the next exciting thing.

1 Like

Yes I meant voice/conversational/interactive mode, but it’s same even at home where I am on 1Gbit fiber optic internet… yes connection was over Wi-Fi but it’s still pulling about 250 Mbit upload/download, when I’m on mobile data, that significantly drops, but I didn’t had any issues until some time go…

Regarding my personal use, interruption weren’t this bad until about a week or maybe 10 days or so ago, (maybe two weeks, I’m really spitballing here), anyhow, it was pretty much fine and usable then suddenly went to hell.

Dunno if there was some stealth update or something else broke it, but I stopped using it as it won’t let me finish a sentence.
Regarding the hotel app:

When you mention bandwidth issues, I see how it could turn real ugly real fast. Thanks for the heads up!

While most of Montenegro has decent-ish internet coverage and even remote areas in worst case scenario usually have somewhat reliable mobile data (4G), the high bandwidth requirements would definitely be an issue, especially for accommodations housing multiple families.

Many of these places have a single internet connection shared between owners and guests, which is pretty much a standard offering for smaller hosts.

So once again, thanks for the tip, I hadn’t considered that it would require so much bandwidth.

The software already includes text-based guest-to-host communication/orders/offers through a chat/ticket system, supporting 11 languages. It includes templates for specific offers and standard guest/accommodation management features. They aim to develop it further, and as you can imagine, these days everyone and their mother are looking for a way to get involved with AI in some way… (
Especially since there is currently an active funding project for innovative startups. I better shut up :stuck_out_tongue: Thanks again)

Huh. Strange. I haven’t noticed this. I only get interruptions when I’m driving and figure it’s just a network delay.

This is worth looking at though. Found on Hacker News and IMO is pretty amazing.

1 Like

Thanks,

Whoa, Pipecat is pretty impressive, I have more and more trust in Llama as time goes by, unfortunately I cannot test latest Claude without US number (I mean I could probably figure out something, but I have no will/nor desire, unless someone pays me to do it (even tho what little I saw was pretty impressive)

btw to which demo you were referring,
Last one I saw was now at least a month old video on YT, titled “Live demo of GPT-4o vision capabilities”

Were you referring to something else? I just saw there’s new “learn a language” demo

BTW just rewatched " Live demo of GPT-4o vision capabilities" and noticed how at some points voice sounds robotic, either they had bandwidth issues, or they did it intentionally and “nerfed” the voice to sound more robotic/AI like. I am kind of leaning toward network issues…

(tho last I read was that @sama said that they actually have much more impressive voice model which is Coming soon™ . and now I keep wondering what kind of connection will that require

I used to work in music industry for a long time, and during my apprenticeship and early days I did lots of voice recording and even though my hearing did degrade with age, I am fairly certain I am not imagining. The voice in month old demo sounds fishy, either due to the bandwidth problems or there is some other reason, it makes me wonder why would they release demo with voice model having artifacts… that I pretty much only heard so far couple times, during blackouts when I messed with GPT4o interactive mode and network was congested,

Thanks again for the tips and for both the Pipecat demo and reminder to move my Hacker News bookmark so it’s visible again from all the rubbish I keep bookmarking… (I have terrible bookmarks/open tabs hygiene, rn I have, lemme see… almost 3000 tabs open… :S )

1 Like

It is! I am in the middle of attempting to make something similar myself.

Same! There’s also some other models worth checking out. HF released a new leaderboard after they found certain people were cheating and using the eval data as training data. Although this new leaderboard seems to be break a lot:

I’ve been messing around with a bunch on Colab. No bueno trying to use an external model if you want to achieve a fast response. The fastest I could get was about ~500ms from OpenAI using gpt-4o (faster than gpt-3.5). BUT these open source models can easily hit less than 100ms for TTFT (Time till First Token). I use GPT2 locally just for testing

BUT, I need the instruct module, aka no enforced schema like ChatML. I’m hoping I can take advantage of the free input to have the model output a “diff” result. Adding and/or subtracting text live.

Brutal considering that they were hooked up through a network cable :rofl:.
There’s a lot of edge cases involved with handle audio that OpenAI did not show, and what I think is the main reason why there hasn’t been any single company that has blown up with a virtual live assistant.

Oh… my lord…

A critical issue for me is that they never solved the current issues with simple vocal input. The AI model is much more impressive but the infrastructure and platform they host it on isn’t. No code is perfect the first try but they just throw crap on-top of it instead of fixing it (technical debt).

Considering that it’s a PoC I would imagine that they had bootstrapped some components together.

I have been messing around with a live vocal assistant for a couple days now and one consensus is just rapid-firing tokens for each modality. So running a STT rapid-fire and patching it. Rapid firing an LLM to understand the text as it comes in and build upon it and/or re-build it. Then finally streaming out the audio which may have to be cut, changed, and re-arranged rapidly as the LLM updates the text. It’s a BRUTAL battle which OpenAI says “hold my beer” to, and throws in the video part of it as well

On a side note: The app definitely is interrupting me way more often now