Whisper API - Requests - Error - Transcribing a Podcast from RSS Feed

I’m trying to first just get transcriptions working, and eventually deploy to a cloud service, but for now, simply test transcriptions. The idea is to go, grab the first podcast, from the RSS feed, and transcribe that. I’ve tried using GPT4 to help with the code, and I’ve tried Code Interpreter (or Advanced Data Analysis) and just can’t seem to get anywhere. What could I be doing wrong?

Here is the code I’m using:

import base64
import feedparser
import openai
import os
import requests
import ssl
import urllib.request


# If you're facing issues with SSL certificate verification, use this:
ssl._create_default_https_context = ssl._create_unverified_context

def download_episode(episode_url, local_path):
    """
    Download the podcast episode from the provided URL and save it to the local path.
    """
    local_audio_file = f"{local_path}/episode.mp3"

    with requests.get(episode_url, stream=True) as r:
        r.raise_for_status()
        with open(local_audio_file, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)

    return local_audio_file

def transcribe_podcast(rss_url, local_path):
    # Parse the RSS feed
    intelligence_feed = feedparser.parse(rss_url, request_headers={"User-Agent": "Mozilla/5.0"})
    if not intelligence_feed.entries:
        print("No entries found in the RSS feed.")
        return

    # Get the first episode's audio URL
    episode_url = intelligence_feed.entries[0].links[0].href

    # Download the episode
    local_audio_file = download_episode(episode_url, local_path)

    # Convert audio to base64
    with open(local_audio_file, "rb") as f:
        base64_audio = base64.b64encode(f.read()).decode('utf-8')

    # Make a direct API call to the OpenAI Whisper endpoint
    headers = {
        'Authorization': f'OPENAI_API_KEY',  # Make sure to replace with your API key
        'Content-Type': 'application/json',
    }

    data = {
        'audio': base64_audio
    }

    response = requests.post('https://api.openai.com/v1/whisper/asr', headers=headers, json=data)

    if response.status_code == 200:
        transcription = response.json().get('transcription', '')
        print("Transcription:", transcription)
    else:
        print("Error:", response.text)
    print(response.json())
if __name__ == "__main__":
    print("Starting Podcast Transcription Function")
    rss_url = "https://feeds.simplecast.com/uSa3prhz"
    local_path = "./"
    transcribe_podcast(rss_url, local_path)

And here is the error I’m getting when I try to run it:

Starting Podcast Transcription Function
Error: {
“error”: {
“message”: “Invalid URL (POST /v1/whisper/asr)”,
“type”: “invalid_request_error”,
“param”: null,
“code”: null
}
}

{‘error’: {‘message’: ‘Invalid URL (POST /v1/whisper/asr)’, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

That is the problematic part of your code.

And that is the root of the problem

GPT-4 doesn’t have knowledge of the OpenAI API and tends to hallucinate the part it doesn’t know about.

Here’s how you should use the openai package to make call to the audio transcriptions API:

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
audio_file = open("audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
2 Likes

Thank you, I’ve had a number of issues, and I overcomplicated it. A combination of using your reminder to use what is suggested in the OpenAI Documentation, as per the code example you provided. Also using the environment variables helped. I’m now of course running into rate limits, yes, hallucinations, context window issues, and the expected challenges. At least though I’ve accomplished the initial hurdle of getting, parsing the proper URL, downloading it, retrieving a podcast episode from a supplied RSS feed, and then attempting to transcribe it (assuming it’s either chunked or within the limits of what can be done in the environment performing the action). Thank you.

You’re welcome.

If you you’re having to break audio into chunks because of the 25 MB limit, you can try compression, if the results are of same quality.

2 Likes

Well it’s not the 25MB limit for the reason I’m needing to chunk the transcription, which I’m now able to finally do with success. It’s getting it done quickly, and done in parallel processing. So I’m chunking up the file first so I can process based on silences, find the sets of chunks based on silences, then transcribe all the chunks at once, so I can get a faster transcription. Then re-assemble this transcription. I’m still working out to get that to really be as efficient as I am aiming for it to be to be worth the effort of all this. Still a good learning exercise to explore pipelines and the use of transcription.

However the big challenge and this will be likely a new thread, is that ultimately, I want to summarize a 1-2 hour podcast into 1-3 paragraphs and some bullet points and key insights. Even if I did have granted 32K context window access, there simply isn’t enough size, to summarize (at least it would seem) the entire transcription of 1-2 hours of a podcast accurately. With GPT-4.

I’m very open to thoughts though. I’m guessing I’ll have to summarize chunks and somehow figure out once that I have the transcription done. Then go back over it, re-chunk it, based on a tokenizer approach, create a pseudo summary. Then combine the two summaries via API Function of sorts perhaps… to then generate the full summary of both (or then would/could be) 5-10 summaries all together.

In other words I could be summarizing 10 different summaries of re-chunked sections to just get 1 actual summary of 1 - (1-2 hour podcast).