Help Putting Whisper Code Into Python Script

superdoesthings · January 29, 2024, 12:58am

Hey, I’m working with the OpenAI API and I’m trying to convert this script into using the Whisper API, and I can’t figure out how to make it function the same. What I mean by functioning the same is always listening using speechrecognition (r.listen()) so I won’t have to press a button or trigger the “recording” to start talking to the bot.

Code:

from openai import OpenAI
import os
import pyaudio
import speech_recognition as sr
from pathlib import Path
from playsound import playsound

client = OpenAI(
api_key=‘nicetry;)’
)

messages1 = [{‘role’: ‘system’, ‘content’: ‘You are a person texting. Try to keep the responses short.’}]

while True:
r = sr.Recognizer()
mic = sr.Microphone()

with mic as source:
    print("[listening]")
    try:
        audio = r.listen(source, timeout=5)  # Adjust timeout as needed
        prompt = r.recognize_google(audio)
        print("you said: " + prompt)
    except sr.WaitTimeoutError:
        print("[listening]")
        continue
    except sr.UnknownValueError:
        print("[listening]")
        continue

if prompt.lower() == 'quit':
    break

usrmsg = {'role': 'user', 'content': prompt + ' '}
messages1.append(usrmsg)
print("[loading. . .]")

completion = client.chat.completions.create(
    model='gpt-3.5-turbo', messages=messages1
)

text = completion.choices[0].message.content

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input=text,
)

# Assuming 'output.mp3' is the file you want to write to
output_file_path = Path("output.mp3")

with output_file_path.open("wb") as file:
    file.write(response.content)
print(text)
# Play the generated audio
playsound(str(output_file_path));
if os.path.exists("output.mp3"):
    os.remove("output.mp3");

gptmsg = {'role': 'assistant', 'content': text + ' '}
messages1.append(gptmsg)

print(“Bye! See you later!”)

Whisper Code (from docs):

from openai import OpenAI
client = OpenAI()

audio_file= open(“/path/to/file/audio.mp3”, “rb”)
transcript = client.audio.transcriptions.create(
model=“whisper-1”,
file=audio_file
)

Macha · January 29, 2024, 2:10am

Hey there and welcome to the community!

A really helpful tutorial I keep coming back to this this one:

The diarization bit is unnecessary, but what I believe you are asking for is streaming data for the model to transcribe. You basically need to create a mechanism where it automatically sends a file to the whisper API every X seconds. It does not have its own streaming function yet, and there is no way to send it data without recording the data first.

The other thing to keep in mind is that “always on” is going to get extremely expensive pretty quick, and most of that expense is going to transcribing empty data (or worse, it misinterprets other sounds as speech and misfires), so if you do wish to build such a function, consider these consequences.

For context: This is why you need to say “Alexa” or “Hey Google” every time you want to use one of those things, so it “knows” when speech is happening. That is how big tech got around this problem.

vb · January 29, 2024, 9:45am

I think what you are looking for is voice activity detection. Comparable to Alexa and other home assistants the app is always listening but the actual recording for transcription will only start following a speech command (Alexa! Order popcorn!).

Previously I implemented such a solution using Silero VAD, and while I couldn’t easily set my own activation word but instead had to use pre-determined choices, it did work quite well.

Note that the source is already quite old in AI time. Maybe there have been newer developments that I am not aware of.

Hope this helps!

Topic		Replies	Views
How to write a Python script for the new version of OpenAI Whisper API? API api	0	1664	March 21, 2024
Detect Silence using whsiper API gpt-4 , api , whisper	3	3416	November 5, 2024
Speech to Text (Whisper) to Review (ChatGPT) API whisper	1	1932	October 4, 2023
Whisper help ឵឵឵឵឵឵឵឵឵឵឵឵ ឵឵឵឵឵឵ ឵឵឵ API whisper	4	861	March 13, 2024
Whisper syllable classification API	3	179	August 4, 2024

Help Putting Whisper Code Into Python Script

Related topics