Need an API That Combines Audio Transcription and Translation

RamiHan · May 17, 2025, 8:16am

Currently, to implement live translation using OpenAI’s APIs,

I must first transcribe audio using the transcribe model,

then send the transcribed text to the conversation model for translation.

This process creates two separate outputs, which slows down the overall response time.

If the transcription output could be handled internally and passed directly to the translation process—without being returned separately—it could significantly reduce the delay in real-time translation.

I would greatly appreciate an API that integrates these steps into a single request.

aprendendo.next · May 17, 2025, 10:30am

You can use gpt-4o-audio-preview and gpt-4o-mini-audio-preview (if you have access to it). It allows both audio and text when sending and receiving responses.

The response comes in audio and transcription if you enable both in modalities.

This example there shows how to enable text and audio modalities.

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

More about audio models:
https://platform.openai.com/docs/guides/audio?api-mode=chat

Topic		Replies	Views
Multimodal queries with voice API	0	1054	May 15, 2024
Implementing audio conversation with AI API	8	4159	April 29, 2024
Audio-preview \|\| how to get both audio and text output API	2	630	November 5, 2024
Is there an API that recognizes from voice data and responds to voice data? API chat , audio	1	1133	December 22, 2023
Audio support in the Chat Completions API Announcements	13	4811	December 12, 2024

Need an API That Combines Audio Transcription and Translation

Related topics