Hello everyone,
I’ve been working on a project that involves a series of conversions. The goal is to capture speech from the microphone, transcribe it into text using Whisper ASR, use the transcribed text as prompts for the GPT model, and finally, feed the GPT outputs into an image generator like DALL-E.
Here’s the code I have so far:
import io
import requests
import sounddevice as sd
import wavio
import openai
import os
# Set your OpenAI API key
openai.api_key = "your_api_key" # Replace "your_api_key" with your actual OpenAI API key
# Record audio
def record_audio(filename, duration, fs):
print("Recording audio...")
recording = sd.rec(int(duration * fs), samplerate=fs, channels=2)
sd.wait()
wavio.write(filename, recording, fs, sampwidth=2)
print("Audio recorded and saved as", filename)
# Transcribe audio using Whisper ASR API
def transcribe_audio(filename):
print("Transcribing audio...")
with io.open(filename, "rb") as audio_file:
content = audio_file.read()
headers = {
"Content-Type": "audio/wav",
"Authorization": f"Bearer {openai.api_key}",
}
url = "https://api.openai.com/v1/audio/transcriptions"
response = requests.post(url, headers=headers, data=content)
response.raise_for_status()
transcription = response.json()["choices"][0]["text"].strip()
return transcription
# Generate image using DALLE-2
def generate_image(text, image_path):
print("Generating image...")
response = openai.Image.create(
model="image-alpha-001",
prompt=text,
num_images=1,
size="256x256",
response_format="url",
)
image_url = response.data[0].url
image_response = requests.get(image_url)
with open(image_path, "wb") as f:
f.write(image_response.content)
print("Image generated and saved as", image_path)
# Main function
def main():
audio_filename = "audio.wav"
image_path = "generated_image.jpg"
duration = 5 # Duration of the recording in seconds
fs = 44100 # Sample rate
record_audio(audio_filename, duration, fs)
transcription = transcribe_audio(audio_filename)
generate_image(transcription, image_path)
if __name__ == "__main__":
main()
I’m having some issues getting this to work as expected. I’ve replaced “your_api_key” with my actual OpenAI API key, but I’m still getting errors.
I would really appreciate any help or advice on how to get this working. I’d love to hear about any similar projects you’ve done, or any resources you think might be helpful. My knowledge about these API’s is lacking so any help is helpfull
Thanks in advance!