Stream response from `/v1/chat/completions` endpoint is missing the first token

I am successfully streaming a response from this endpoint using the 'gpt-3.5-turbo' model, so I feel the implementation is right, but I am consistently missing the very first token. Is anybody else seeing this?

Here is my stream reader implementation (JavaScript):

  const reader = response.body.getReader()
  reader.read().then(function processingText({ done, value }) {
    if (done) return
 
    const decoded = new TextDecoder().decode(value)
    const json = decoded.split('data: ')[1]  // this data needs some manipulation in order to be parsed, a separate concern
    const aiResponse = JSON.parse(json)
    const aiResponseText = aiResponse.choices[0].delta?.content
    result += aiResponseText || ""
    return reader.read().then(processingText)
  }).catch(console.error)

And here is what a sample response looks like:

data: {"id":"chatcmpl-7BWXDk8d3xI1ycw2fzoudrXCCKniY","object":"chat.completion.chunk","created":1682981027,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Mr"},"index":0,"finish_reason":null}]}

and after parsing:

[
    {
        "delta": {
            "content": " Mr"
        },
        "index": 0,
        "finish_reason": null
    }
]

Notice the empty space before “Mr”. This is because there is supposed to be a “Dear” in front of that token. I am asking the model to compose a letter. I know it should contain the “Dear” because every other test I’ve run without stream has included this token first.

That is technically the second token I receive in the stream, the first looks like this:

[
    {
        "delta": {
            "role": "assistant"
        },
        "index": 0,
        "finish_reason": null
    }
]

Notice there is no content property on this object, as it seems to just be a metadata chunk. No issue there, but there is also no initial token between this chunk and the next chunk above (" Mr").

So, where is this missing token? A bug perhaps? I have combed the API response for this initial token, but it is absent.

How did you conduct your testing? Was it via the Playground or ChatGPT?

Could you provide the specific prompt you used for testing? This way, we can better understand your approach. Additionally, consider including a request for a polite response within your prompt.

Thanks Frederic. I’m very confident here. I ran dozens of tests asking for a letter with the same prompt, model and temperature using the same fetch request without using stream, and every single response included “Dear”. Then, as soon as switching to stream, the Dear is gone, and there is the aforementioned empty space in the string of the first token I receive.

Does it happen with other prompts or just this one?

If it’s always the same Dear, you could tack it onto the completion on your end, but that’s a bit of a kludge.

I don’t use stream=true for anything at the moment, but I’d test other prompts. Check for stray spaces or carriage returns too …

This issue is resolved. It had to do with the decoded string value I was getting back. The first chunk received does have the metadata, but also includes the first token. Without performing the decoded.split() command, the first chunk looks like this:

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat.completion.chunk","created":1683211105,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat.completion.chunk","created":1683211105,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"Dear"},"index":0,"finish_reason":null}]}

That’s all one chunk, so I pivoted to instead of splitting on the data : string, to regex matching the “content” value like this:

let { groups: { newToken } } = decoded.match(/data:\s*{.*?"content":"(?<newToken>.*?)".*?}/s)

It’s still a little janky, but it does solve my issue.

4 Likes

Nice. Thanks for coming back to let us know. Hopefully this helps someone in the future.

1 Like

This is partially off-topic, but I stumbled upon this thread by chance and noticed some potential problems with your code that I thought to point out since there’s so little information about this and I was just confronted with writing my own parser as well. This is in the interest of preventing any future problems for you and others who might find themselves in our situation.

What’s returned by a read call appears to be binary data fresh from the network. Usually it’s one or multiple objects as a string, each prefixed with a "data: " string and separated by a double line-feed “\n\n”. However, it’s also possible to receive partial data, which I encountered with the weaker models in particular, like text-babbage-001. For example, you might find yourself reading a string like this,

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat

which you couldn’t do much with on its own. It would also be potentially problematic to call decode on such a partial byte buffer, because the data is in UTF8 and one byte might not equal one character, so you should refrain from stringifying the data until you actually have a full line.

That means a robust parser will have to read the incoming bytes and buffer that raw data until it encounters the delimiter, which follows every dataset, including the “[DONE]” string at the end. Once you find a delimiter, you can take what you’ve buffered so far, trim the "data: " prefix, and handle the actual object. I would also recommend not using regex to parse the data, but actually converting it using JSON. That way you will get errors when something unexpected happens, like invalid JSON being returned, and you will also not have to worry about the data including quotation, which could cause bugs with the code you showed here.

The kind of implementation I’m describing should cover all cases and provide reliable parsing. At least I have not encountered any issues with it yet across the different APIs and various models. Here’s what such an implementation might roughly look like.

let buffer = new Uint8Array(512);
let bufferIdx = 0;

while (true)
{
	const { done, value } = await reader.read();
	if (done)
		break;

	for (let i = 0; i < value.byteLength; ++i)
	{
		// Write to the buffer until we reach a double new-line
		// delimiter
		buffer[bufferIdx++] = value[i];

		if (bufferIdx >= 2 && value[i] == 10 && buffer[bufferIdx - 2] == 10)
		{
			// Handle one data object
			const lineBuffer = buffer.subarray(0, bufferIdx - 2);
			const line = decoder.decode(lineBuffer);

			// Each line starts with a "data: " prefix, followed by
			// the actual data, which is usually a JSON object
			if (line.indexOf('data: ') !== 0)
				throw new Error('Expected "data:" prefix in: ' + line);

			// Trim the "data: " prefix
			const dataStr = line.substring(6);

			// Stop if we reached the end of the stream
			if (dataStr === '[DONE]')
				break;

			// Parse and handle data
			const dataObj = JSON.parse(dataStr);

			// Handle data...

			// Reset buffer and continue reading
			bufferIdx = 0;
		}
	}
}
1 Like