This is a python script that allows you to speak to GPT-3

About the Script
I use three modules, openai, speech_recognition and pyttsx3. Together they work to take in text and input it into whatever GPT-3 set up you’ve got going on. I primarily use it to chat. I’m sharing this because I really don’t see all that many ‘basic’ scripts using the openai library and it was kind of annoyingly obtuse to get this running. I spent a decent chunk of time just to get this working, which added up to a lot of questions to a poor and confused API that was sick of me by the time I was done. (Seriously, that was really cool. GPT-3 got bored of me asking the same question on completely different prompts.) Learned quite a decent amount, not a big python guy, but I think the script runs well and takes tweaks fluently. Find the script here:

Link
https://github.com/Isaac-The-Brave/Speech-to-text-to-speech

Some notes
Some things I’d like to develop further:
Most importantly I’d like to find a way to update the prompt on the go and build a large conversation. Rather than stick to one attempt at the conversation. I’ve only done this to limit the token expenses as I debugged the script. the exit() function is entirely in your hands.

1.) The speech to text system is not great at taking in commas and periods. It really misses a lot of the nuance that this AI seems to thrive on. So any recommendations on libraries are welcome.

2.) Hopefully I find a more calming voice than Microsoft Sam (Despite his nostalgic value). If anyone recommends some solution for that, it would be awesome.

3.) Some visuals would be nice, maybe see if I can make API calls with the AI results and get a sentiment grading on it and represent that in the graphic (frowny face/smile etc. etc.)

Feel free to share any tips and to fork/use this freely.

24 Likes

Cool. I’m about to use Azure voice services for my project. Congrats on the experiment! What are your thoughts?

1 Like

I think there’s really something to say about the fact that the hardest part about building some applications with this API is the UX design. It’s something else… I’m trying to not let it get to my head that I’ve got access to this amazing tech.

Can you tell me more about Azure Communication Services? I’m trying to keep things as open-source as I can.

4 Likes

This is a really cool project! Thanks for sharing. I noticed You saying on Your Github page that:

“GPT-3 got bored of me asking the same question on completely different prompts.”

I also noticed GPT remembers exact data(created by me) from my other sessions, when it brought up stuff I was feeding in about a day ago. Not so stoked about that, because OpenAI staff told that each prompt request is a unique event, not relying on previous session inputs.
https://community.openai.com/t/retrieve-engine-and-how-long-does-the-instance-stay-active/2489/3?u=vertinski
That changes some things about development and needs clarification.

2 Likes

The data popped up when using API with Python. Well the API knows each user by their API Key and somehow it dug up the old (almost a day old, but still) prompt data and served it back to me because of my buggy request.

2 Likes

Pay extra attention to your prompts. I thought that GPT-3 was doing some crazy stuff until I looked closer at my prompts.

2 Likes

Yes, but whatever the prompt is – if, as stated by OpenAI staff, the engine is instantiated anew each time I make a request – how does it access an old request? That’s interesting because it could be a massive feature not a bug.

5 Likes

I personally chalked it down to my character getting more confident responses from GPT-3 the more prompts I used. Eventually, I believe the OpenAI chatbot gained some kind of ‘independence’ from the initial prompt like that as well. If I got it wrong, please enlighten me.

2 Likes

That would mean that Your engine instance is still active regardless of when or what is being requested – so it would retain some kind of state memory.
This needs a clarification from the OpenAI staff.

1 Like

I agree that maintaining internal representation state could be a powerful feature. I don’t believe that’s what’s happening though. You’re probably contaminating your own prompts, as I was inadvertently.

I suspect that, behind the scenes, there are many instances of GPT-3 running in containers. Probably kubernetes or Docker. As such, each of your requests will be served to one of those containers based on first available. So the claim that it “remembers” past prompts depends on (1) you getting the same exact instance as last time and (2) the internal state of said instance being persistent. That’s why I don’t find it likely.

3 Likes

No cached data on client side whatsoever, only current response displayed.

1 Like

I have been building a talking system to, lots of fun :). Modelling the prompt as as conversation, like “This is a conversation between Albert Einstein and a Visitor in the year XYZ.
Albert Einstein: My name is Albert Einstein, how may i help you?
Visitor:
Albert Einstein:”

and then just feeding the prompt in again with the next answer of GPT-3 and the visitor appended. Got mixed results, i recommend setting the temperature not to low and try Sigmund Freud on the topic of street art for some interesting suggestions on how street art is related to ones childhood :wink:

If want to save tokens or do a really long conversation, you can summarize the conversation so far and use that as a prefix in the prompt i guess.

7 Likes

Could be due to high usage maybe…

1 Like

@IsaacTheBrave I got access for 2 weeks with wellsaidlabs.com API to test how this would work with your code. Their voices are much better than RD2D I hear from my laptop currently. Care to try it out?

@marcus.arkan Sure, looks promising! Here’s a project I found compelling that uses audio with gpt-3 to create some fun conversations.

1 Like

@IsaacTheBrave I like what they did there, but my only concern is latency. WS claims to be in real-time, but their documentation is in Nodejs. Can you convert it to python?

https://github.com/starkan0010/WsAPI

@marcus.arkan Looking at the github you shared, it seems like a pretty simple POST request with some bells and whistles for errors and logging. As a matter of fact, you could probably get GPT to do it for you. Actively, the only real code here is

 import fs from "fs";
 import fetch from "node-fetch";
 import AbortController from "abort-controller";
 async function ttsRequestHandler(text, speakerId) {
const ttsAbortController = new AbortController();
   const ttsEndPoint = "https://api.wellsaidlabs.com/v1/tts/stream";
   let ttsResponse;
   try {
     ttsResponse = await fetch(ttsEndPoint, {
       signal: ttsAbortController.signal,
       method: "POST",
       headers: {
         "Content-Type": "application/json",
         "X-Api-Key": `YOUR_API_KEY`
       },
       body: JSON.stringify({
         speaker_id: speakerId,
         text
       })
     });
   } catch (error) {
     throw new Error("Service is currently unavailable");
   }
 ​
   if (!ttsResponse.ok) {

     let errorMessage = "Failed to render";
     try {
       const { message } = await ttsResponse.json();
       errorMessage = message;
     } catch (error) {}
     throw new Error(errorMessage);
   }
   const contentType = ttsResponse.headers.get("Content-Type");
   const storageWriteStream = fs.createWriteStream("/tmp/somerandomfile");
   ttsResponse.body.pipe(storageWriteStream);
 ​
   try {
     await new Promise((resolve, reject) => {
       storageWriteStream.on("finish", resolve);
       storageWriteStream.on("error", reject);
     });
   } catch (error) {
     ttsAbortController.abort();
     throw error;
   }
 }

It’s a few lines and you could probably go through it line by line and recreate it in python if you’re interested in learning . There’s an openai example just for this kind of scenario. J.S to Python

.

Actually, that was the first thing I attempted, but the code that it is returning seems strange. Still learning python. I thought you could hammer it out within a few minutes to test the latency & quality with me on, i.e., slack. The API key is only good for 2 weeks since it’s a trial with WSL.

At the top of WSL’s documentation page they have a curl request. Input it here and you’ll get the python code you desire.

@IsaacTheBrave

Here it is. I will have the API tomorrow and I will run some tests.

import speech_recognition
import openai
import pyttsx3
import requests


def ttsRequesthandler(text, speakerId):
    ttsEndPoint = "https://api.wellsaidlabs.com/v1/tts/stream"
    API_KEY = "XXXXXXXX"
    source_code = {'speaker_id':speakerId, 'text':text}
    data = {'api_dev_key':API_KEY,'api_option':'paste','api_paste_code':source_code,'api_paste_format':'python'}
    response = ""
    try:
        response = requests.post(url=ttsEndPoint, data =  data)
    except:
        print("Service is currently unavaible")

    pastebin = response.text
    print("result:", pastebin)
    return pastebin

# GPT-3 Parameters
openai.organization = "ORG-KEY-HERE"
openai.api_key = 'XXXXXXXXXX'

## Speech Recognition Algorithm
recognizer = speech_recognition.Recognizer()
print("Please speak into the microphone:")

## Function which inputs speechtotext into openAI's API
while True:

    try:

        with speech_recognition.Microphone() as mic:

            #Ready the Microphone
            recognizer.adjust_for_ambient_noise(mic, duration=0.2)
            audio = recognizer.listen(mic)
            #Translate speech to text
            SpeechText = recognizer.recognize_google(audio)
            SpeechText = SpeechText.lower()

            ## GPT-3 API
            myPrompt = """
            The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

            Human: Hello, who are you?
            AI: I am an AI created by OpenAI. How can I help you today?
            Human:{SpeechText}
            AI:"""

            # GPT-3 Engine parameters
            start_sequence = "\nAI:"
            restart_sequence = "\nHuman: "
            Addon = "\n"


            response = openai.Completion.create(
                engine="davinci",
                temperature=0.9,
                max_tokens=100,
                top_p=1,
                prompt = str(myPrompt.replace("{SpeechText}", SpeechText)),
                frequency_penalty=0,
                presence_penalty=0.6,
                stop=["\n", "Human:", "AI:"]
            )
            # Print out results for further processing
            saytext = ttsRequesthandler(SpeechText, "3")
            prompt = myPrompt.replace("{SpeechText}", SpeechText),
            #print(f"Human:{SpeechText}\nAI:{response.choices[0].text}")
            print(f"Human:{SpeechText}\nAI:{saytext}")
            # SPEAK IT OUT
            engine = pyttsx3.init()
            
            engine.say(saytext)
            engine.runAndWait()
            exit()

    except speech_recognition.UnknownValueError:
       print("I didn't quite get you. Can you please repeat that?")
       recognizer = speech_recognition.Recognizer()
    continue