Completions of gpt-4o-mini-audio-preview model missing audio in response

Hi,
During my testing of the API “v1/chat/completions” endpoint, I encountered some weird issues.

Sometimes the API response is missing the audio output (the audio object is completly missing in the response), but it always has the text output.
The correct and wrong responses below are from the same conversation, each received answer is added to the request’s messages array as text.

Does anybody have any idea what is going on here?
Is my implementation is wrong or is this a bug?

Also, is there a way to get a transscription of the input audio as I cannot find an answer in the API documentation. Would really love that feature and not needing to make another call to the transscribe endpoints.

My Code (typescript):

// function that uses the openAI API to input audio and output text + audio
  generateTextAndAudio: async (
    audioBlob: Blob,
    voice:
      | "alloy"
      | "ash"
      | "ballad"
      | "coral"
      | "echo"
      | "sage"
      | "shimmer"
      | "verse"
  ) => {
    try {
      // convert audioBlob to mp3
      const mp3Blob = await convertAudioBlobToMp3(audioBlob);

      // convert mp3Blob to base64 string
      const base64Audio = await audioBlobToBase64(mp3Blob);

      // no audio to send
      if (!base64Audio) {
        throw new Error("No audio to send to server");
      }

      const response = await fetch(
        "https://api.openai.com/v1/chat/completions",
        {
          method: "POST",
          headers: {
            Authorization: `Bearer ${openAiApiKey2}`,
            "Content-Type": "application/json",
          },
          body: JSON.stringify({
            messages: [
              ...openAiRealtime.chatHistory,
              {
                role: "user",
                content: [
                  {
                    type: "input_audio",
                    input_audio: {
                      data: base64Audio || "",
                      format: "mp3",
                    },
                  },
                ],
              },
            ],
            model: "gpt-4o-mini-audio-preview",
            audio: {
              format: "mp3",
              voice: voice,
            },
            modalities: ["text", "audio"],
            response_format: { type: "text" },
          }),
        }
      );

      // response ok?
      if (!response.ok || !response.body) {
        throw new Error("Failed to generate text and audio");
      }

      // process the response
      const responseObject = await response.json();

      console.log("responseObject:", responseObject);

      let fullText = "";
      let fullAudio = new Blob([failJingle], { type: "audio/mp3" });

      // get the text
      if (responseObject.choices[0]?.message?.content) {
        fullText = responseObject.choices[0]?.message.content;
      }

      // get the transcribed text
      if (responseObject.choices[0]?.message?.audio?.transcript) {
        fullText = responseObject.choices[0]?.message?.audio?.transcript;
      }

      if (responseObject.choices[0]?.message?.audio?.data) {
        // get the audio data
        fullAudio = responseObject.choices[0]?.message?.audio?.data;
      }

      // no transsribed text for the input audio?
      // if (responseObject.input.audio.transcript) {
      //   openAiRealtime.chatHistory.push({
      //     role: "user",
      //     content: [{ type: "text", text: responseObject.choices[0]?.message?.audio?.transcript }],
      //   });
      // }

      if (fullText) {
        // add the text to the chat history
        openAiRealtime.chatHistory.push({
          role: "assistant",
          content: [{ type: "text", text: fullText }],
        });
      }

      return {
        transscribed_input: "TODO",
        text: fullText,
        audio: fullAudio,
      };
    } catch (error) {
      console.error(error);
      return {
        text: "Something went wrong. please try again.",
        audio: new Blob([failJingle], { type: "audio/mp3" }),
      };
    }
  }

Response with audio:

{
  "id": "chatcmpl-xxx", // cleared ID
  "object": "chat.completion",
  "created": 0000, // cleared unixtime stamp
  "model": "gpt-4o-mini-audio-preview-2024-12-17",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "refusal": null,
        "audio": {
          "id": "audio_xxx", // cleared ID
          "data": "valid_base64_audio_string_here__removed_because_of_forum_message_length_limit",
          "expires_at": 000, clear unix timestamp
          "transcript": "Well, I sometimes feel like my ideas and contributions aren't really valued or taken seriously. It’s like they overlook what I say, and I end up feeling invisible in meetings or discussions. [sad]"
        },
        "annotations": []
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 2035,
    "completion_tokens": 284,
    "total_tokens": 2319,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 136,
      "text_tokens": 1899,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 226,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0,
      "text_tokens": 58
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_xxx" // cleared fingerprint
}

Response without audio:

{
  "id": "chatcmpl-xxx", // cleared id
  "object": "chat.completion",
  "created": 0, // cleared unix timestamp
  "model": "gpt-4o-mini-audio-preview-2024-12-17",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure. The memory I'm focusing on is from when I was dealing with severe stress at work and didn't have anyone to turn to for support. I felt isolated and misunderstood by my colleagues. The pressure felt relentless, and I remember getting overwhelmed by every little challenge. My supervisor criticized me harshly, and I felt like I couldn't do anything right. [hurt] It still makes me feel inadequate and unimportant even now. [sad]",
        "refusal": null,
        "annotations": []
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 2118,
    "completion_tokens": 87,
    "total_tokens": 2205,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 44,
      "text_tokens": 2074,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0,
      "text_tokens": 87
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_xxx" // cleared ID
}