As far as I can tell, yes, replacing the unicode start-of-text (002) and end-of-text (003) characters with simple plain-text sequences fixed the problem.
At least it did for me.
So far :-).
Now, how about translating creative writings?
Should we set the temperature in the middle or … both?
If you want to translate a creative story it’d still be a lower temperature.
A higher temperature would lead to incorrect translations or, even worse, token insanity like seen here
For translations, you need to play around to find the best temperature for your use case. I find that temperatures lower than 0.8 produce very literal translations, the kind you get on ChatGPT as a first response, and in my case those are useless (you can tell from a mile away the text was translated). Temperature at 1.0 works fairly well in that over 50% of the time the result doesn’t “feel translated” and so you only have to tweak/improve about half of the output in order to get a real-world usable translation. When you go above 1.3 the style of the output starts to deteriorate rapidly.
But, like I said, you have to play with it and find your sweet spot.
You do realise AI is just an advanced search application, using a giant database.
Often, when there are too many parallel requests, the system will just reject the request.
You are being charged for the request and not the response.
It may seem a little unfair, but each request is using up server bandwidth.
This is really down to the particular API’s commercial contract, with its customers.
As AI becomes more competitive, companies, providing API access, will need to find ways of attracting new customers.
This may become a USP, at some point, but currently there aren’t enough reliable AI API providers out there, to drive this kind of competition.
I guess, one day, you may find that you don’t get charged for requests that initiate dud responses.
This will sound a bit tongue in cheek and I suppose it might be, but it seems like a true indicator that it’s reached consciousness might be the fact that spews out random responses to the exact same inputs. You’re more human every day, Data!
I’ve been holding my breath for a sentient AGI too, but I really think this case was not a model glitch, it was just an API bug.
On AGI, though, OpenAI’s 5-stage path to AGI is exciting, even if we’re only on stage 1 with GPT 4. And I hope to (fill in whatever deity you believe in) that we end up with something even remotely like Data. But I doubt we’ll be reliving “The Measure of a Man” anytime soon - maybe your quote from the Borg queen is more à propos.
Cheers!
This is exactly what it’s not. But I agree costs will go down for sure, new efficient hardware, better algos, etc.
I know this is settled, and solved. I learned a lot. Thanks for sharing your solution.
In my humble experience, Temperature is not about the frequency of producing an error like this, it is more about how far each generation can deviate, from literal, toward creative responses. I think of a temp of 0.0 as being nearly deterministic, or metaphorically, staying on the path through the forest on my way to the ‘correct’ response. Higher temperatures allow each successive prediction to stray further and further off course or completely teleporting to the middle of the ocean.
With temp at maximum, 2.0, you will get about 8 words that make sense with the prompt, but the next 8 or so (the second pass through the model) will be way off course. Then it gets progressively worse… but impossible to really tell how much worse/off track.
I would guess that your problem is a combination of the high temp and the Unicode in the text. The Unicode pulls the response off track a little, toward areas of the model that have a lot of Unicode in them, or strong relationships between Unicode and other gibberish. Temp of 1.8 is encouraging a large jump per pass. But in your successful responses, that jump still lands in words that make sense… (I assume?).
I am surprised it only was happening 1% of the time. Anything over 1.2 is a gamble. Again, only speaking from my experience, which is probably pretty limited.
It’s definitely coming from the model.
After experimenting a bit and seeing your follow-up message with the API call you’re making the are two things at play here,
- The Unicode characters you’re including in your input. You’ve already verified this is part of the issue by confirming the gibberish text disappears after removing them.
- The temperature setting is way too high to be usable. There seems to be a lot of confusion around temperature. It’s not directly correlated with “creativity” but higher temperatures can make the model appear to be more creative because it will be more likely to sample unexpected tokens.
When you have the temperature cranked like that, it “flattens the curve” of the sampling distribution so instead of, say, “knock, knock” bring like 99.9% likely to be followed by “who’s there” it will only be, say, 20% likely. So you’re more likely to get something unexpected.
What seems to be happening here is the interaction between having some random, unexpected Unicode characters in the input and the high temperature causes things to quickly go off the rails.
Since there will be very few to zero examples of these random Unicode characters in the training corpus, the model doesn’t know exactly what to do with them or when they should be predicted in the output.
This is an absolutely incorrect take. Temperature is entirely uncorrelated with the concept of creativity.
Hey Jake.
Very interesting, thank you.
So the final culprits are the Unicode characters, seeing as I did come across the problem a couple of times at low temperatures, and only by experimenting with higher and higher temperatures was I able to get the issue to occurr repeatedly… I understand now that I was increasing the likelyhood the model would try and consider a broader range of options, but the real thing it was choking on was the Unicode.
All very, very interesting. Thank you!