Input audio format document?

Hi,

I noticed gpt-4o audio is released. I looked through the get start here. I wonder if is there any other way to pass audio to the model. Can someone refer me to the right place to look at?

Welcome to the community @hallowlucas66

Can you elaborate more about your use-case?

1 Like

Thanks sps,

I am trying to evaluate the audio reasoning capability of the model. And my audio data is saved on huggingface which is stored in audio array format. I wonder if there is a way that the model take in array instead of wav file format.

The audio-preview models can take “wav” and “mp3” formats for the audio content parts in user messages.

In order to use the your data from huggingface, you’re going to have to convert it to wav format.

Here's some code to get you started with conversion process
from scipy.io.wavfile import write
import numpy as np

sample_rate = 16000  # Adjust to match your data
audio_array = np.array([...], dtype=np.float32)  # Your audio data
write("output.wav", sample_rate, audio_array)
1 Like

Thanks for the reply! It makes sense to me after thinking about the request process.

In addition, I wonder if there is any duration limit for the passed audio. Can I pass a long audio to it and maybe it will cut them up?