The API indeed does need you to send multipart/form-data as the POST contents, and the original file is in MIME attachment format, along with multipart text messages.
The file does need to be in “file” form, not raw audio or a stream. It sounds like you already have an mp3 file, perhaps in temporary storage or a buffer.
I’d say “ask an AI”, but it also is very poor at understanding CURL from the API reference and constructing multipart sends from scratch. Using a library is far easier.
Here’s some code I just happen to have sitting around here, written as a demo of sending using Python’s requests library instead of OpenAI, where files=
does the magic for you.
import os
import requests
# Gets the API key from environment variable
api_key = os.getenv("OPENAI_API_KEY")
headers = {"Authorization": f"Bearer {api_key}"}
print(headers) # show that you are using a valid key
url = "https://api.openai.com/v1/audio/transcriptions"
audio_file_name = "joke.mp3"
base_file_name = os.path.splitext(audio_file_name)[0] # Get the base file name
with open(audio_file_name, "rb") as audio_file:
parameters = {
"file": (audio_file_name, audio_file),
"language": (None, "en"),
"model": (None, "whisper-1"),
"prompt": (None, "Here is the comedy show."),
"response_format": (None, "verbose_json"),
"temperature": (None, "0.1"),
"timestamp_granularities[]" : (None, "word"),
}
response = requests.post(url, headers=headers, files=parameters)
if response.status_code != 200:
print(f"HTTP error {response.status_code}: {response.text}")
else:
# Get the transcribed text and timed words from the response
transcribed_text = response.json()['text']
words = response.json()['words']
formatted_words = [
{k: f"{v:.2f}" if isinstance(v, float) else v for k, v in word.items()}
for word in words
]
# Save text or words to a file
try:
with open(f"{base_file_name}_transcription.txt", "w") as file:
file.write(transcribed_text)
print(f"Transcribed text successfully saved to '{base_file_name}_transcription.txt'.")
with open(f"{base_file_name}_timestamped.txt", "w") as file:
file.write(str(formatted_words))
print(f"Timestamped words successfully saved to '{base_file_name}_timestamped.txt'.")
except Exception as e:
print(f"output file error: {e}")
print(formatted_words[:20])
You could make an io.BytesIO
virtual file to move data around if needed, where OpenAI would instead barf on that. Then handle the response parsing as you wish from JSON received.