TTS API audio quality lower than Playground, even with PCM/WAV

MaxMigliore · June 29, 2025, 6:03pm

Hi everyone,

I’m using the TTS API, specifically the gpt-4o-mini-tts model, to generate high-quality audio. I’ve noticed a significant quality difference between the audio generated via the API and what I get from web services using OpenAI (like openai.fm) or the Playground itself.

The audio from the Playground is crystal-clear and full. The audio I generate via the API, even when requesting pcm format and saving it as a .wav file, has a slight but noticeable distortion/artifact, almost a metallic “crackle,” especially on sibilants and high tones.

My workflow is as follows:

I make a request to the API asking for response_format=“pcm” to get the highest quality raw data.
I receive the raw pcm data stream.
I save this data into a .wav file using the correct parameters (24000 Hz sample rate, 16-bit, mono), which I confirmed by analyzing the files from the Playground.

This is the core Python snippet I use for generation and saving:

from openai import OpenAI
from pathlib import Path
import numpy as np
import soundfile as sf

# Client Initialization
client = OpenAI(api_key="YOUR_API_KEY")

# API Call
response_pcm = client.audio.speech.create(
    model="gpt-4o-mini-tts",
    voice="onyx",
    input="This is a high-quality audio test.",
    response_format="pcm"
)

# Reading and Saving the data
pcm_data = response_pcm.read()
audio_array = np.frombuffer(pcm_data, dtype=np.int16)

sf.write(
    "api_output.wav", 
    audio_array, 
    samplerate=24000
)

My question is:

Are there any undocumented parameters or specific techniques for the client.audio.speech.create API call that could influence the final audio quality, beyond just the format selection? For example, parameters related to the PCM codec’s bitrate, dithering techniques, or anything else?

My goal is to replicate the same clean, crystal-clear audio quality via the API that is achievable through the Playground.

Any suggestions or insights would be greatly appreciated. Thanks

aprendendo.next · June 29, 2025, 6:12pm

Have you tried just saving the file without using any additional package?

speech_file_path="file.wav"
audio_response = client.audio.speech.create(
  model="gpt-4o-mini-tts",
  voice="sage",
  input="hi",
  response_format="wav"
)
audio_bytes = audio_response.content
audio_response.write_to_file(speech_file_path)

MaxMigliore · July 3, 2025, 7:32am

Hi, thank you for your reply, yes I have tried all ways. But nothing, the quality via api is not the same as via openai.fm So I wanted to see if other people have noticed the same problem as me.

aprendendo.next · July 3, 2025, 10:00am

I didn’t notice any difference, but if you believe openai.fm has some secret sauce you can look into its source code, on github.

satish_lalam · October 28, 2025, 8:15am

has this issue been resolved for the OP?

Topic		Replies	Views
I am facing an TTS-audio conundrum. API gpt-4 , api	7	901	April 1, 2024
Getting metallic voice at slower speeds on speech API API api , tts	2	1756	November 14, 2023
How to decrease the latency of Text-To-Speech API? API gpt-4 , api	6	4712	April 26, 2024
Low and slow audio from realtime API, how to properly audio format? API realtime , api-realtime , api-realtime-speech	7	5590	December 25, 2024
Too much difference in Playground response vs API response API gpt-4 , playground	3	3098	October 17, 2024

TTS API audio quality lower than Playground, even with PCM/WAV

Related topics