Whisper AI calling without file path

Hi. I’m not very experienced with any of this so I am sorry if the question might be a bit off.

I am building an AI assistant integrating outsystems with chat gpt (model- gpt4o) and I managed to integrate chat completions with tools and image and am now attempting to do something similar with the whisper ai transcriptions post method.
Since I am calling the api from the outsystems platform, I can’t really give it a path, and also I don’t want the user to download the file. I am calling the API with a JSON format and I tried passing the mp3 as a base64 like I did with the image for vision.

The request looks a bit like this

{
    "model" : "gpt-4o",
    "file" : "data:audio/mpeg;base64,**contenthere**"
}

It is not working giving me

"error": {
    "message": "Could not parse multipart form",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }

Can somebody point me in the right direction?
I am happy to provide any info necessary to get the help I need obviously
Thanks in advance,
Luis

The API indeed does need you to send multipart/form-data as the POST contents, and the original file is in MIME attachment format, along with multipart text messages.

The file does need to be in “file” form, not raw audio or a stream. It sounds like you already have an mp3 file, perhaps in temporary storage or a buffer.

I’d say “ask an AI”, but it also is very poor at understanding CURL from the API reference and constructing multipart sends from scratch. Using a library is far easier.

Here’s some code I just happen to have sitting around here, written as a demo of sending using Python’s requests library instead of OpenAI, where files= does the magic for you.

import os
import requests

# Gets the API key from environment variable
api_key = os.getenv("OPENAI_API_KEY")
headers = {"Authorization": f"Bearer {api_key}"}
print(headers)  # show that you are using a valid key
url = "https://api.openai.com/v1/audio/transcriptions"

audio_file_name = "joke.mp3"
base_file_name = os.path.splitext(audio_file_name)[0]  # Get the base file name

with open(audio_file_name, "rb") as audio_file:
    parameters = {
        "file": (audio_file_name, audio_file),
        "language": (None, "en"),
        "model": (None, "whisper-1"),
        "prompt": (None, "Here is the comedy show."),
        "response_format": (None, "verbose_json"),
        "temperature": (None, "0.1"),
        "timestamp_granularities[]" : (None, "word"),
    }
    response = requests.post(url, headers=headers, files=parameters)

if response.status_code != 200:
    print(f"HTTP error {response.status_code}: {response.text}")
else:
    # Get the transcribed text and timed words from the response
    transcribed_text = response.json()['text']
    words = response.json()['words']
    formatted_words = [
        {k: f"{v:.2f}" if isinstance(v, float) else v for k, v in word.items()}
        for word in words
    ]
    # Save text or words to a file
    try:
        with open(f"{base_file_name}_transcription.txt", "w") as file:
            file.write(transcribed_text)
        print(f"Transcribed text successfully saved to '{base_file_name}_transcription.txt'.")
        
        with open(f"{base_file_name}_timestamped.txt", "w") as file:
            file.write(str(formatted_words))
        print(f"Timestamped words successfully saved to '{base_file_name}_timestamped.txt'.")
    except Exception as e:
        print(f"output file error: {e}")

    print(formatted_words[:20])

You could make an io.BytesIO virtual file to move data around if needed, where OpenAI would instead barf on that. Then handle the response parsing as you wish from JSON received.

Thanks for your reply. I didn’t understand it tough. I need a request with JSON format. Is there any other way besides path for the file format? If so can you give a example in JSON format?
Thanks!