Seeking Help: Converting Speech to text, Text to text and then Text to Image using OpenAI's

Hello everyone,

I’ve been working on a project that involves a series of conversions. The goal is to capture speech from the microphone, transcribe it into text using Whisper ASR, use the transcribed text as prompts for the GPT model, and finally, feed the GPT outputs into an image generator like DALL-E.

Here’s the code I have so far:

import io
import requests
import sounddevice as sd
import wavio
import openai
import os

# Set your OpenAI API key
openai.api_key = "your_api_key"  # Replace "your_api_key" with your actual OpenAI API key

# Record audio
def record_audio(filename, duration, fs):
    print("Recording audio...")
    recording = sd.rec(int(duration * fs), samplerate=fs, channels=2)
    sd.wait()
    wavio.write(filename, recording, fs, sampwidth=2)
    print("Audio recorded and saved as", filename)

# Transcribe audio using Whisper ASR API
def transcribe_audio(filename):
    print("Transcribing audio...")
    with io.open(filename, "rb") as audio_file:
        content = audio_file.read()

    headers = {
        "Content-Type": "audio/wav",
        "Authorization": f"Bearer {openai.api_key}",
    }

    url = "https://api.openai.com/v1/audio/transcriptions"
    response = requests.post(url, headers=headers, data=content)
    response.raise_for_status()
    transcription = response.json()["choices"][0]["text"].strip()
    return transcription

# Generate image using DALLE-2
def generate_image(text, image_path):
    print("Generating image...")
    response = openai.Image.create(
        model="image-alpha-001",
        prompt=text,
        num_images=1,
        size="256x256",
        response_format="url",
    )
    image_url = response.data[0].url
    image_response = requests.get(image_url)
    with open(image_path, "wb") as f:
        f.write(image_response.content)
    print("Image generated and saved as", image_path)

# Main function
def main():
    audio_filename = "audio.wav"
    image_path = "generated_image.jpg"
    duration = 5  # Duration of the recording in seconds
    fs = 44100  # Sample rate

    record_audio(audio_filename, duration, fs)
    transcription = transcribe_audio(audio_filename)
    generate_image(transcription, image_path)

if __name__ == "__main__":
    main()

I’m having some issues getting this to work as expected. I’ve replaced “your_api_key” with my actual OpenAI API key, but I’m still getting errors.

I would really appreciate any help or advice on how to get this working. I’d love to hear about any similar projects you’ve done, or any resources you think might be helpful. My knowledge about these API’s is lacking so any help is helpfull

Thanks in advance!

Hi @rickknops.rk

Welcome to the community.

What errors are you getting?

1 Like

Hi, @sps currently the code records the audio but when i end up at the part of the whisper transcription i got an acces error. Ive tried a bunch of things even trying new api keys but nothing seems to work. Dont know hoe the rest afterwards will be

Do you get the error when using the following boilerplate code from docs?

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
audio_file = open("audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)

ETA: code from api ref

@sps i keep getting 400 Client Error: Bad Request for url: __api.openai.com/v1/audio/transcriptions

I tried using this one as well

Note: you need to be using OpenAI Python v0.27.0 for the code below to work

import openai
audio_file= open(“/path/to/file/audio.mp3”, “rb”)
transcript = openai.Audio.transcribe(“whisper-1”, audio_file)

But weirdly it doesnt include my api key as such

Hi @sps,

I wondered if you’ve had a chance to look over the entire code yet? I’m a bit uncertain about the steps following the ‘whisper’ section, particularly since I seem to be having issues with the key inclusion. Furthermore, I’m looking to add a section for the text-to-text GPT-4 API but I’m unsure how to integrate it. Any assistance you can provide with the entire coding process would be greatly appreciated.

Oops the code snippet in docs didn’t include initialization. I’ve updated the code from API reference and can confirm that it runs on my end. LMK if you get any errors.

gpt-4 can be accessed over the Chat completion endpoint. LMK any questions you have after reading chat completion docs

@sps It seems that there is a problem with adding in my openai key. Even though im regenerating a new one to try out the code it seems he doesnt give me acces anymore. Billing is fixed.

By Texting the code, do you mean the small one you included or the full code i have written which works on your end?

I will first try to get the Transcription working

This code is for transcription and it works. You should try incorporating it into your code for the transcribe_audio function since you’re already using the openai module.