Here is what I’ve got today as an output from GPT-4 API (gpt4-32k): “Mars and Venus could habr been habitable zones in our solar system.”. See the typo in “have”? That’s the name of a (formerly Russian, but that’s not the point) tech website, “habr”. I am not aware of using word “habr” as substitution of “have”, and even ChatGPT agrees that it’s not an English word.
After seeing countless 4chan attacks on Google and other search engines, I wonder what happened here. Since you can’t train GPT by sending any queries to it, did someone hack into an actual GPT training process? To promote their website, if nothing else?
And another question - have you seen anything like it in your results?
ChatGPT is programmed to allow unexpected word generations. GPT-4 has been reduced in quality giving higher perplexity. If you received a generation with a problem, give it a thumbs-down for a re-generation and recording of your dislike.
How do you give thumbs-down to an API response? Also, GPT’s word generation always (in my experience, at least) makes sense, it’s never a senseless typo. It may create a new word out of another one, but if it will use “4chan” instead of a “website”, it’s probably a good reason to intensify one’s paranoia.
With API, we simply give response to end user, we don’t filter it in any way. Apparently, this approach is too optimistic.
You didn’t understand the question in first post. And, clearly, it wasn’t for you.
Temperature has nothing to do with whatever happened here. Which, again, is either a simulated typo (“have - habr” is a typo if you missed one key to the right) or a more sinister thing. Either way, the temperature was 0.
GPT is trained on a lot of text, including stuff like poorly written forum posts, so it’s not really unheard of to get the occasional typo.
It becomes a problem when it makes the same errors frequently, though. In my experience, it almost never spells “pored over” correctly, for example, and I’ve gotten “descails” instead of descends multiple times. (Even made topics about those, which were ignored.)
If you can’t replicate the funny word again and again – that’s temperature. Actually it is probabilistic multinomial sampling from the confidence scores of logits after being optionally filtered by a nucleus sampler. That probably doesn’t matter to you as much as your theory of Russian attacks.
The deembedding of the hidden state to k tokens will give very unlikely results also, which are then translated to very low probabilities of selection. However, that does not mean that completely nonsense tokens are eliminated as possibilities (unless you include a top_p parameter) - you roll the dice with each generation.
That’s important. It means that we should implement spelling check when putting GPT behind customer support / bots. Though I guess OpenAI will implement it eventually.
Do you recommend first adjusting top_p to avoid typos that cannot be replicated or first adjusting temperature or both?
As of May 2024, our team using the API has noticed random typos within a response which have never occurred before and no change has been made to the model, prompt and parameters in use. We were using the 1106-preview model, so we tried using the gpt-4-turbo model but typos (bad token generation) still occurred. We are now trying to lower the top_p setting to see if that helps avoid the issue, while avoiding repetitiveness which is a critical requirement.
For context, I haven’t been on the “ChatGPT is getting worse train”, but I would like to call out that I too have been noticing odd spelling errors for common words and random formatting issues in the middle of generation. As a daily user of both ChatGPT and the API, I don’t remember seeing issues like this since before May 2024. Still an amazing technology, but just wanted to share my experience as well.