Seeking Help: Converting Speech to text, Text to text and then Text to Image using OpenAI's

rickknops.rk · May 13, 2023, 8:11am

Hello everyone,

I’ve been working on a project that involves a series of conversions. The goal is to capture speech from the microphone, transcribe it into text using Whisper ASR, use the transcribed text as prompts for the GPT model, and finally, feed the GPT outputs into an image generator like DALL-E.

Here’s the code I have so far:

import io
import requests
import sounddevice as sd
import wavio
import openai
import os

# Set your OpenAI API key
openai.api_key = "your_api_key"  # Replace "your_api_key" with your actual OpenAI API key

# Record audio
def record_audio(filename, duration, fs):
    print("Recording audio...")
    recording = sd.rec(int(duration * fs), samplerate=fs, channels=2)
    sd.wait()
    wavio.write(filename, recording, fs, sampwidth=2)
    print("Audio recorded and saved as", filename)

# Transcribe audio using Whisper ASR API
def transcribe_audio(filename):
    print("Transcribing audio...")
    with io.open(filename, "rb") as audio_file:
        content = audio_file.read()

    headers = {
        "Content-Type": "audio/wav",
        "Authorization": f"Bearer {openai.api_key}",
    }

    url = "https://api.openai.com/v1/audio/transcriptions"
    response = requests.post(url, headers=headers, data=content)
    response.raise_for_status()
    transcription = response.json()["choices"][0]["text"].strip()
    return transcription

# Generate image using DALLE-2
def generate_image(text, image_path):
    print("Generating image...")
    response = openai.Image.create(
        model="image-alpha-001",
        prompt=text,
        num_images=1,
        size="256x256",
        response_format="url",
    )
    image_url = response.data[0].url
    image_response = requests.get(image_url)
    with open(image_path, "wb") as f:
        f.write(image_response.content)
    print("Image generated and saved as", image_path)

# Main function
def main():
    audio_filename = "audio.wav"
    image_path = "generated_image.jpg"
    duration = 5  # Duration of the recording in seconds
    fs = 44100  # Sample rate

    record_audio(audio_filename, duration, fs)
    transcription = transcribe_audio(audio_filename)
    generate_image(transcription, image_path)

if __name__ == "__main__":
    main()

I’m having some issues getting this to work as expected. I’ve replaced “your_api_key” with my actual OpenAI API key, but I’m still getting errors.

I would really appreciate any help or advice on how to get this working. I’d love to hear about any similar projects you’ve done, or any resources you think might be helpful. My knowledge about these API’s is lacking so any help is helpfull

Thanks in advance!

sps · May 13, 2023, 8:33am

Hi @rickknops.rk

Welcome to the community.

What errors are you getting?

rickknops.rk · May 13, 2023, 6:05pm

Hi, @sps currently the code records the audio but when i end up at the part of the whisper transcription i got an acces error. Ive tried a bunch of things even trying new api keys but nothing seems to work. Dont know hoe the rest afterwards will be

sps · May 13, 2023, 6:12pm

Do you get the error when using the following boilerplate code from docs?

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
audio_file = open("audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)

ETA: code from api ref

rickknops.rk · May 13, 2023, 7:07pm

@sps i keep getting 400 Client Error: Bad Request for url: __api.openai.com/v1/audio/transcriptions

I tried using this one as well

Note: you need to be using OpenAI Python v0.27.0 for the code below to work

import openai
audio_file= open(“/path/to/file/audio.mp3”, “rb”)
transcript = openai.Audio.transcribe(“whisper-1”, audio_file)

But weirdly it doesnt include my api key as such

rickknops.rk · May 13, 2023, 7:11pm

Hi @sps,

I wondered if you’ve had a chance to look over the entire code yet? I’m a bit uncertain about the steps following the ‘whisper’ section, particularly since I seem to be having issues with the key inclusion. Furthermore, I’m looking to add a section for the text-to-text GPT-4 API but I’m unsure how to integrate it. Any assistance you can provide with the entire coding process would be greatly appreciated.

sps · May 13, 2023, 7:22pm

Oops the code snippet in docs didn’t include initialization. I’ve updated the code from API reference and can confirm that it runs on my end. LMK if you get any errors.

sps · May 13, 2023, 7:27pm

gpt-4 can be accessed over the Chat completion endpoint. LMK any questions you have after reading chat completion docs

rickknops.rk · May 14, 2023, 1:01pm

@sps It seems that there is a problem with adding in my openai key. Even though im regenerating a new one to try out the code it seems he doesnt give me acces anymore. Billing is fixed.

By Texting the code, do you mean the small one you included or the full code i have written which works on your end?

I will first try to get the Transcription working

sps · May 15, 2023, 4:54am

This code is for transcription and it works. You should try incorporating it into your code for the transcribe_audio function since you’re already using the openai module.

Topic		Replies	Views
Issues with Audio Transcription Using OpenAI Python Library on Raspberry Pi API	4	2323	August 30, 2024
Help Putting Whisper Code Into Python Script API	2	2678	January 29, 2024
Translation api returns incorect api key while the same key works for chat Bugs whisper , audio	2	139	October 11, 2024
'OpenAI' import error, and audio.transcribe or any audio related functions not supported Bugs	0	136	December 4, 2024
I can't figure out how to use the Text to Audio feature API	62	4250	April 16, 2024

Seeking Help: Converting Speech to text, Text to text and then Text to Image using OpenAI's

Note: you need to be using OpenAI Python v0.27.0 for the code below to work

Related topics