Completions of gpt-4o-mini-audio-preview model missing audio in response

t.steinz · March 31, 2025, 1:12pm

Hi,
During my testing of the API “v1/chat/completions” endpoint, I encountered some weird issues.

Sometimes the API response is missing the audio output (the audio object is completly missing in the response), but it always has the text output.
The correct and wrong responses below are from the same conversation, each received answer is added to the request’s messages array as text.

Does anybody have any idea what is going on here?
Is my implementation is wrong or is this a bug?

Also, is there a way to get a transscription of the input audio as I cannot find an answer in the API documentation. Would really love that feature and not needing to make another call to the transscribe endpoints.

My Code (typescript):

// function that uses the openAI API to input audio and output text + audio
  generateTextAndAudio: async (
    audioBlob: Blob,
    voice:
      | "alloy"
      | "ash"
      | "ballad"
      | "coral"
      | "echo"
      | "sage"
      | "shimmer"
      | "verse"
  ) => {
    try {
      // convert audioBlob to mp3
      const mp3Blob = await convertAudioBlobToMp3(audioBlob);

      // convert mp3Blob to base64 string
      const base64Audio = await audioBlobToBase64(mp3Blob);

      // no audio to send
      if (!base64Audio) {
        throw new Error("No audio to send to server");
      }

      const response = await fetch(
        "https://api.openai.com/v1/chat/completions",
        {
          method: "POST",
          headers: {
            Authorization: `Bearer ${openAiApiKey2}`,
            "Content-Type": "application/json",
          },
          body: JSON.stringify({
            messages: [
              ...openAiRealtime.chatHistory,
              {
                role: "user",
                content: [
                  {
                    type: "input_audio",
                    input_audio: {
                      data: base64Audio || "",
                      format: "mp3",
                    },
                  },
                ],
              },
            ],
            model: "gpt-4o-mini-audio-preview",
            audio: {
              format: "mp3",
              voice: voice,
            },
            modalities: ["text", "audio"],
            response_format: { type: "text" },
          }),
        }
      );

      // response ok?
      if (!response.ok || !response.body) {
        throw new Error("Failed to generate text and audio");
      }

      // process the response
      const responseObject = await response.json();

      console.log("responseObject:", responseObject);

      let fullText = "";
      let fullAudio = new Blob([failJingle], { type: "audio/mp3" });

      // get the text
      if (responseObject.choices[0]?.message?.content) {
        fullText = responseObject.choices[0]?.message.content;
      }

      // get the transcribed text
      if (responseObject.choices[0]?.message?.audio?.transcript) {
        fullText = responseObject.choices[0]?.message?.audio?.transcript;
      }

      if (responseObject.choices[0]?.message?.audio?.data) {
        // get the audio data
        fullAudio = responseObject.choices[0]?.message?.audio?.data;
      }

      // no transsribed text for the input audio?
      // if (responseObject.input.audio.transcript) {
      //   openAiRealtime.chatHistory.push({
      //     role: "user",
      //     content: [{ type: "text", text: responseObject.choices[0]?.message?.audio?.transcript }],
      //   });
      // }

      if (fullText) {
        // add the text to the chat history
        openAiRealtime.chatHistory.push({
          role: "assistant",
          content: [{ type: "text", text: fullText }],
        });
      }

      return {
        transscribed_input: "TODO",
        text: fullText,
        audio: fullAudio,
      };
    } catch (error) {
      console.error(error);
      return {
        text: "Something went wrong. please try again.",
        audio: new Blob([failJingle], { type: "audio/mp3" }),
      };
    }
  }

Response with audio:

{
  "id": "chatcmpl-xxx", // cleared ID
  "object": "chat.completion",
  "created": 0000, // cleared unixtime stamp
  "model": "gpt-4o-mini-audio-preview-2024-12-17",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "audio": {
          "id": "audio_xxx", // cleared ID
          "data": "valid_base64_audio_string_here__removed_because_of_forum_message_length_limit",
          "expires_at": 000, clear unix timestamp
          "transcript": "Well, I sometimes feel like my ideas and contributions aren't really valued or taken seriously. It’s like they overlook what I say, and I end up feeling invisible in meetings or discussions. [sad]"
        },
        "annotations": []
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 2035,
    "completion_tokens": 284,
    "total_tokens": 2319,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 136,
      "text_tokens": 1899,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 226,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0,
      "text_tokens": 58
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_xxx" // cleared fingerprint
}

Response without audio:

{
  "id": "chatcmpl-xxx", // cleared id
  "object": "chat.completion",
  "created": 0, // cleared unix timestamp
  "model": "gpt-4o-mini-audio-preview-2024-12-17",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure. The memory I'm focusing on is from when I was dealing with severe stress at work and didn't have anyone to turn to for support. I felt isolated and misunderstood by my colleagues. The pressure felt relentless, and I remember getting overwhelmed by every little challenge. My supervisor criticized me harshly, and I felt like I couldn't do anything right. [hurt] It still makes me feel inadequate and unimportant even now. [sad]",
        "refusal": null,
        "annotations": []
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 2118,
    "completion_tokens": 87,
    "total_tokens": 2205,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 44,
      "text_tokens": 2074,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0,
      "text_tokens": 87
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_xxx" // cleared ID
}

Sky_Jin · June 7, 2025, 6:14am

Same problem.

Here is my alternative.
When sending history messages, I only include audio_id in the last assistant message, keeping other assistant messages only include transcript.

It keeps the model responding in audio successfully.

Topic		Replies	Views
Audio Generation frequently returns audio as None Bugs gpt-4	0	80	November 8, 2024
Response audio suddenly cuts off when using "gpt-4o-audio-preview-2024-12-17" Bugs bug , gpt-4o-audio-preview	3	660	February 19, 2025
Realtime API sometimes returns text-only responses even when output_modalities is set to audio Bugs	0	94	September 1, 2025
Audio input not working when migrating from completions to responses Deprecations gpt-4 , api	3	178	October 27, 2025
Missing input audio transcription API api-realtime	6	386	May 12, 2025

Completions of gpt-4o-mini-audio-preview model missing audio in response

Related topics