Bug: Occasional bugged token in Babbage model

Occasionally, Babbage will output some extremely rare token that should have a near-zero probability. Most other tokens around it will appear to be normal. Here’s some examples:

Jonas doesn’t answer for a few moments. He looks around the small room, taking in the fresh newDust sprinkled in the air.

There should not be “newDust” in the generated text.

The shaman stared at you in disbelief as the projectile hit him in the chest, Bits ofMagic flying out in all directions. “What the hell,” he shouted as he collapsed to the ground. “Another one of yourILLEVENT!”

Capitalized “Bits”, “ofMagic”, and “yourILLEVENT” are highly anomalous and don’t belong in the generated text

This is using default settings: Temperature is 1 and repetition penalties are 0

The prompt used in this case was:

[Follow bracketed instructions to generate the next part of the story, in the style of a well-written novel.]

Background: The kingdom of elves lies in ruin after the latest war between orcs and elves. Various warring factions are vying for new leadership of the elven kingdom. However, the people of the mage’s guild have their own plans to subvert the elves and return magic to all the lands.

You then say to shaman who wants peace, “Hello kind sir. How do you do?”
“Hello there my good fellow,” he says as his eyes widen slightly when they see you. “I’m Jonas,” he replies with a smile. He gestures towards a chair for you to sit down.

[Generate the next 3-4 sentences of the story (2nd-person perspective, present tense), in which you try but utterly fail to use “magic missile” on the shaman who wants peace.]

How are you pulling the data back? is it just 1 data block handled by the OpenAI API or are you handling the sockets and managing the flow? I have seen situations where things such as block size markers get introduced into streaming replies (not sure if relevant) and other partial block responses that get incorrectly reassembled.

Again not 100% sure if any of that is relevant but worth asking the question.

Thanks; however, I don’t think that’s the issue for this. I’m just using the playground; this text was copy/pasted directly from there. And it doesn’t seem like the other models have this issue

Super interesting, the playground “should” handle everything correctly. I wonder if the values are detectable out there, as in… if you DID handle the comms manually, if they could be reliably detected… nice find.

The more I look at the responses, the missing spaces… all of that, I am thinking that maybe the Playground stream handler is not dealing with chunk size messages correctly… I had to alter my code to handle it for that exact issue…

Thanks for the info; however, based on your comments I have now tested the same thing in the API request tool Postman, and found the same behavior of weird non-space tokens (and Postman works correctly for everything else as well), which means the issue is most likely model-specific