Cool. I’m about to use Azure voice services for my project. Congrats on the experiment! What are your thoughts?

1 Like

I think there’s really something to say about the fact that the hardest part about building some applications with this API is the UX design. It’s something else… I’m trying to not let it get to my head that I’ve got access to this amazing tech.

Can you tell me more about Azure Communication Services? I’m trying to keep things as open-source as I can.

4 Likes

This is a really cool project! Thanks for sharing. I noticed You saying on Your Github page that:

“GPT-3 got bored of me asking the same question on completely different prompts.”

I also noticed GPT remembers exact data(created by me) from my other sessions, when it brought up stuff I was feeding in about a day ago. Not so stoked about that, because OpenAI staff told that each prompt request is a unique event, not relying on previous session inputs.
https://community.openai.com/t/retrieve-engine-and-how-long-does-the-instance-stay-active/2489/3?u=vertinski
That changes some things about development and needs clarification.

2 Likes

The data popped up when using API with Python. Well the API knows each user by their API Key and somehow it dug up the old (almost a day old, but still) prompt data and served it back to me because of my buggy request.

2 Likes

Pay extra attention to your prompts. I thought that GPT-3 was doing some crazy stuff until I looked closer at my prompts.

2 Likes

Yes, but whatever the prompt is – if, as stated by OpenAI staff, the engine is instantiated anew each time I make a request – how does it access an old request? That’s interesting because it could be a massive feature not a bug.

5 Likes

I personally chalked it down to my character getting more confident responses from GPT-3 the more prompts I used. Eventually, I believe the OpenAI chatbot gained some kind of ‘independence’ from the initial prompt like that as well. If I got it wrong, please enlighten me.

2 Likes

That would mean that Your engine instance is still active regardless of when or what is being requested – so it would retain some kind of state memory.
This needs a clarification from the OpenAI staff.

1 Like

I agree that maintaining internal representation state could be a powerful feature. I don’t believe that’s what’s happening though. You’re probably contaminating your own prompts, as I was inadvertently.

I suspect that, behind the scenes, there are many instances of GPT-3 running in containers. Probably kubernetes or Docker. As such, each of your requests will be served to one of those containers based on first available. So the claim that it “remembers” past prompts depends on (1) you getting the same exact instance as last time and (2) the internal state of said instance being persistent. That’s why I don’t find it likely.

3 Likes

No cached data on client side whatsoever, only current response displayed.

1 Like

I have been building a talking system to, lots of fun :). Modelling the prompt as as conversation, like “This is a conversation between Albert Einstein and a Visitor in the year XYZ.
Albert Einstein: My name is Albert Einstein, how may i help you?
Visitor:
Albert Einstein:”

and then just feeding the prompt in again with the next answer of GPT-3 and the visitor appended. Got mixed results, i recommend setting the temperature not to low and try Sigmund Freud on the topic of street art for some interesting suggestions on how street art is related to ones childhood :wink:

If want to save tokens or do a really long conversation, you can summarize the conversation so far and use that as a prefix in the prompt i guess.

5 Likes

Could be due to high usage maybe…

1 Like

@IsaacTheBrave I got access for 2 weeks with wellsaidlabs.com API to test how this would work with your code. Their voices are much better than RD2D I hear from my laptop currently. Care to try it out?

@marcus.arkan Sure, looks promising! Here’s a project I found compelling that uses audio with gpt-3 to create some fun conversations.

2 Likes

@IsaacTheBrave I like what they did there, but my only concern is latency. WS claims to be in real-time, but their documentation is in Nodejs. Can you convert it to python?

https://github.com/starkan0010/WsAPI
1 Like

@marcus.arkan Looking at the github you shared, it seems like a pretty simple POST request with some bells and whistles for errors and logging. As a matter of fact, you could probably get GPT to do it for you. Actively, the only real code here is

 import fs from "fs";
 import fetch from "node-fetch";
 import AbortController from "abort-controller";
 async function ttsRequestHandler(text, speakerId) {
const ttsAbortController = new AbortController();
   const ttsEndPoint = "https://api.wellsaidlabs.com/v1/tts/stream";
   let ttsResponse;
   try {
     ttsResponse = await fetch(ttsEndPoint, {
       signal: ttsAbortController.signal,
       method: "POST",
       headers: {
         "Content-Type": "application/json",
         "X-Api-Key": `YOUR_API_KEY`
       },
       body: JSON.stringify({
         speaker_id: speakerId,
         text
       })
     });
   } catch (error) {
     throw new Error("Service is currently unavailable");
   }
 ​
   if (!ttsResponse.ok) {

     let errorMessage = "Failed to render";
     try {
       const { message } = await ttsResponse.json();
       errorMessage = message;
     } catch (error) {}
     throw new Error(errorMessage);
   }
   const contentType = ttsResponse.headers.get("Content-Type");
   const storageWriteStream = fs.createWriteStream("/tmp/somerandomfile");
   ttsResponse.body.pipe(storageWriteStream);
 ​
   try {
     await new Promise((resolve, reject) => {
       storageWriteStream.on("finish", resolve);
       storageWriteStream.on("error", reject);
     });
   } catch (error) {
     ttsAbortController.abort();
     throw error;
   }
 }

It’s a few lines and you could probably go through it line by line and recreate it in python if you’re interested in learning . There’s an openai example just for this kind of scenario. J.S to Python

.

1 Like

Actually, that was the first thing I attempted, but the code that it is returning seems strange. Still learning python. I thought you could hammer it out within a few minutes to test the latency & quality with me on, i.e., slack. The API key is only good for 2 weeks since it’s a trial with WSL.

1 Like

At the top of WSL’s documentation page they have a curl request. Input it here and you’ll get the python code you desire.

1 Like

@IsaacTheBrave

Here it is. I will have the API tomorrow and I will run some tests.

import speech_recognition
import openai
import pyttsx3
import requests


def ttsRequesthandler(text, speakerId):
    ttsEndPoint = "https://api.wellsaidlabs.com/v1/tts/stream"
    API_KEY = "XXXXXXXX"
    source_code = {'speaker_id':speakerId, 'text':text}
    data = {'api_dev_key':API_KEY,'api_option':'paste','api_paste_code':source_code,'api_paste_format':'python'}
    response = ""
    try:
        response = requests.post(url=ttsEndPoint, data =  data)
    except:
        print("Service is currently unavaible")

    pastebin = response.text
    print("result:", pastebin)
    return pastebin

# GPT-3 Parameters
openai.organization = "ORG-KEY-HERE"
openai.api_key = 'XXXXXXXXXX'

## Speech Recognition Algorithm
recognizer = speech_recognition.Recognizer()
print("Please speak into the microphone:")

## Function which inputs speechtotext into openAI's API
while True:

    try:

        with speech_recognition.Microphone() as mic:

            #Ready the Microphone
            recognizer.adjust_for_ambient_noise(mic, duration=0.2)
            audio = recognizer.listen(mic)
            #Translate speech to text
            SpeechText = recognizer.recognize_google(audio)
            SpeechText = SpeechText.lower()

            ## GPT-3 API
            myPrompt = """
            The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

            Human: Hello, who are you?
            AI: I am an AI created by OpenAI. How can I help you today?
            Human:{SpeechText}
            AI:"""

            # GPT-3 Engine parameters
            start_sequence = "\nAI:"
            restart_sequence = "\nHuman: "
            Addon = "\n"


            response = openai.Completion.create(
                engine="davinci",
                temperature=0.9,
                max_tokens=100,
                top_p=1,
                prompt = str(myPrompt.replace("{SpeechText}", SpeechText)),
                frequency_penalty=0,
                presence_penalty=0.6,
                stop=["\n", "Human:", "AI:"]
            )
            # Print out results for further processing
            saytext = ttsRequesthandler(SpeechText, "3")
            prompt = myPrompt.replace("{SpeechText}", SpeechText),
            #print(f"Human:{SpeechText}\nAI:{response.choices[0].text}")
            print(f"Human:{SpeechText}\nAI:{saytext}")
            # SPEAK IT OUT
            engine = pyttsx3.init()
            
            engine.say(saytext)
            engine.runAndWait()
            exit()

    except speech_recognition.UnknownValueError:
       print("I didn't quite get you. Can you please repeat that?")
       recognizer = speech_recognition.Recognizer()
    continue

From a quick glance, this should work. Errors are always likely, I recommend you lower the max_tokens when you’re testing, it can get expensive otherwise.

For your own function, a mistake you have.

saytext = ttsRequesthandler(SpeechText, "3")

You shouldn’t let the API handle SpeechText, that’s the voice input you give. The output that the OpenAPI gives is:

response.choices[0].text

This should be something like

var1 =  response.choices[0].text
saytext = ttsRequesthandler(var1, "3")

has to change.

Also, make sure you remove the last few lines:

            # SPEAK IT OUT
            engine = pyttsx3.init()
            
            engine.say(saytext)
            engine.runAndWait()

Otherwise you’ll have multiple voices.Perhaps remove the entire pyttsx library, at that point it’s only bloat. Good luck.

2 Likes