Input audio format document?

hallowlucas66 · February 10, 2025, 3:37pm

Hi,

I noticed gpt-4o audio is released. I looked through the get start here. I wonder if is there any other way to pass audio to the model. Can someone refer me to the right place to look at?

sps · February 10, 2025, 3:47pm

Welcome to the community @hallowlucas66

Can you elaborate more about your use-case?

hallowlucas66 · February 10, 2025, 7:07pm

Thanks sps,

I am trying to evaluate the audio reasoning capability of the model. And my audio data is saved on huggingface which is stored in audio array format. I wonder if there is a way that the model take in array instead of wav file format.

sps · February 11, 2025, 11:48am

The audio-preview models can take “wav” and “mp3” formats for the audio content parts in user messages.

In order to use the your data from huggingface, you’re going to have to convert it to wav format.

Here's some code to get you started with conversion process

from scipy.io.wavfile import write
import numpy as np

sample_rate = 16000  # Adjust to match your data
audio_array = np.array([...], dtype=np.float32)  # Your audio data
write("output.wav", sample_rate, audio_array)

hallowlucas66 · February 12, 2025, 3:07am

Thanks for the reply! It makes sense to me after thinking about the request process.

In addition, I wonder if there is any duration limit for the passed audio. Can I pass a long audio to it and maybe it will cut them up?

Topic		Replies	Views
Create transcription with gpt-4o-transcribe – max audio file length/size? API audio	0	139	March 24, 2025
Gpt-4o-transcribe audio length limits API	4	1884	May 27, 2025
Gpt-4o-audio-preview unsupported_format error API	4	514	March 18, 2025
Audio file might be corrupted or unsupported Bugs api	1	161	May 20, 2025
Ability to limit Whisper's Duration? API whisper	2	1070	December 18, 2023

Input audio format document?

Related topics