Streaming Silent Audio Chunks

mokhir56 · February 25, 2025, 4:06pm

gpt-4o-mini-audio-preview, temp: 0.7.

98/100 the audio generation works fine. Occasionally, it’ll begin to generation text transcript + audio and stream it back fine, however it’ll hang and then generation an endless amount of silent audio chunks.

My logs:
(% is the non-silent %) - [in brackets is print out of base64encoded]

<90.75%> Audio chunk [1]: [IgAiACEAIAAhACMAIwAkACEAIQAhACIAIgAiACQAHwAiAB4AHw] 
<93.42%> Audio chunk [2]: [qwqbC0oLowq5CuYKYgoxCegKGgpACIgJAggkBjwGwAeBBpYFqg] 
<95.23%> Audio chunk [3]: [IgJ/AukBMQG5AiIAMABgAkoBtQB4ApsC/ABlAW0C2wDJAY4CLQ] 
<96.08%> Audio chunk [4]: [zPkb+zT8FgBRA/IC7AP+AqUBmwRuBr0GOAmzCq4Lsg4wEegRGB] 
<90.37%> Audio chunk [5]: [bgIZA74C7wLBA6UDJwSnBH8EXQQtBnkGpQWhBx8HZQQjBSUG1Q] 
<90.93%> Audio chunk [6]: [4wBfAHEAdAA3ABEAHQDX/9P/GQDY/53/lP9+/1D/HP/J/n3+n/] 
<87.78%> Audio chunk [7]: [///9//7//P/5//r/+f/9//7//v/7//z/+//5//n/+v/7//n/9/] 
<17.69%> Audio chunk [8]: [5gmWCD4IYAciB4sFlARABP4CcwGwAWYAzf5q/pT8DfuK+eT4Vv] 
<16.82%> Audio chunk [9]: [/v8AAP7//f/7//z//f/+//3/+v/8//7//P/8//7//P/+//z//f] 
<0.00%> Audio chunk [10]: [//8AAP7/AAD+//3//v////7//v8AAP///v/+/////v///wAAAA] 
<0.00%> Audio chunk [11]: [AAAAAP//AAAAAP//AAAAAP7///8AAAAAAAD///7/AAABAAAAAQ] 
<0.00%> Audio chunk [12]: [AQABAAAA//////7///////3///8AAAAAAAD///7/AAABAAAAAA] 
<0.00%> Audio chunk [13]: [AAAAAAAAAAAAAP////////7/AAAAAAAAAAD+//7///8BAAEAAA] 
<0.00%> Audio chunk [14]: [AQAAAAAA//////7//v/+//3///8AAAAAAAD+//7/AAABAAAAAQ] 
<0.00%> Audio chunk [15]: [AQABAAAA//////3//v////7/AAAAAAAA///+//3///8AAAEAAQ] 
...
<0.00%> Audio chunk [101]: [AQABAAAA//////3//v////7/AAAAAAAA///+//3///8AAAEAAQ]

At which point my failsafe interrupts the stream.

However, this is concering. Why does the model perform so well in the beginning and then transition into endless silent audio generated?

I will try higher temperatures as I heard [0.8, 1.2] is recommended and experiment.

I want to add, the issue it not with filtering the silent audio chunks, I am capable of doing that. It’s that the request remains long-running and the LLM essentially gets stuck in a loop.

This equivalent request (when working properly) generates no more than 20-30 audio chunks. For it to reach the 100+ audio chunk marks shows that there is something seriously impacted with it.

Topic		Replies	Views
Streaming Response Delay in GPT-4o Transcribe: Issue with Real-Time Audio Chunking API chatgpt , transcribe , gpt-4o , gpt-4o-audio-preview	0	189	April 1, 2025
TTS models returning blank audio and repetitions API	1	118	April 6, 2025
High Costs Due to Silence or Noisy Segments in gpt-4o-audio-preview Outputs Bugs gpt-4o-audio-preview	5	371	February 24, 2025
GPT4 audio preview with streaming of audio output API gpt-4	2	687	January 18, 2025
Gpt-4o-transcribe truncates the transcript API transcribe	11	1401	May 28, 2025

Streaming Silent Audio Chunks

Related topics