Stream response from `/v1/chat/completions` endpoint is missing the first token

davidirvin47 · May 1, 2023, 11:45pm

I am successfully streaming a response from this endpoint using the 'gpt-3.5-turbo' model, so I feel the implementation is right, but I am consistently missing the very first token. Is anybody else seeing this?

Here is my stream reader implementation (JavaScript):

  const reader = response.body.getReader()
  reader.read().then(function processingText({ done, value }) {
    if (done) return
 
    const decoded = new TextDecoder().decode(value)
    const json = decoded.split('data: ')[1]  // this data needs some manipulation in order to be parsed, a separate concern
    const aiResponse = JSON.parse(json)
    const aiResponseText = aiResponse.choices[0].delta?.content
    result += aiResponseText || ""
    return reader.read().then(processingText)
  }).catch(console.error)

And here is what a sample response looks like:

data: {"id":"chatcmpl-7BWXDk8d3xI1ycw2fzoudrXCCKniY","object":"chat.completion.chunk","created":1682981027,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":" Mr"},"index":0,"finish_reason":null}]}

and after parsing:

[
    {
        "delta": {
            "content": " Mr"
        },
        "index": 0,
        "finish_reason": null
    }
]

Notice the empty space before “Mr”. This is because there is supposed to be a “Dear” in front of that token. I am asking the model to compose a letter. I know it should contain the “Dear” because every other test I’ve run without stream has included this token first.

That is technically the second token I receive in the stream, the first looks like this:

[
    {
        "delta": {
            "role": "assistant"
        },
        "index": 0,
        "finish_reason": null
    }
]

Notice there is no content property on this object, as it seems to just be a metadata chunk. No issue there, but there is also no initial token between this chunk and the next chunk above (" Mr").

So, where is this missing token? A bug perhaps? I have combed the API response for this initial token, but it is absent.

frederic.simons · May 2, 2023, 7:20am

How did you conduct your testing? Was it via the Playground or ChatGPT?

Could you provide the specific prompt you used for testing? This way, we can better understand your approach. Additionally, consider including a request for a polite response within your prompt.

davidirvin47 · May 2, 2023, 1:03pm

Thanks Frederic. I’m very confident here. I ran dozens of tests asking for a letter with the same prompt, model and temperature using the same fetch request without using stream, and every single response included “Dear”. Then, as soon as switching to stream, the Dear is gone, and there is the aforementioned empty space in the string of the first token I receive.

PaulBellow · May 3, 2023, 6:22am

Does it happen with other prompts or just this one?

If it’s always the same Dear, you could tack it onto the completion on your end, but that’s a bit of a kludge.

I don’t use stream=true for anything at the moment, but I’d test other prompts. Check for stray spaces or carriage returns too …

davidirvin47 · May 4, 2023, 3:15pm

This issue is resolved. It had to do with the decoded string value I was getting back. The first chunk received does have the metadata, but also includes the first token. Without performing the decoded.split() command, the first chunk looks like this:

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat.completion.chunk","created":1683211105,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat.completion.chunk","created":1683211105,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"Dear"},"index":0,"finish_reason":null}]}

That’s all one chunk, so I pivoted to instead of splitting on the data : string, to regex matching the “content” value like this:

let { groups: { newToken } } = decoded.match(/data:\s*{.*?"content":"(?<newToken>.*?)".*?}/s)

It’s still a little janky, but it does solve my issue.

PaulBellow · May 4, 2023, 7:04pm

Nice. Thanks for coming back to let us know. Hopefully this helps someone in the future.

exectails · May 5, 2023, 1:29pm

This is partially off-topic, but I stumbled upon this thread by chance and noticed some potential problems with your code that I thought to point out since there’s so little information about this and I was just confronted with writing my own parser as well. This is in the interest of preventing any future problems for you and others who might find themselves in our situation.

What’s returned by a read call appears to be binary data fresh from the network. Usually it’s one or multiple objects as a string, each prefixed with a "data: " string and separated by a double line-feed “\n\n”. However, it’s also possible to receive partial data, which I encountered with the weaker models in particular, like text-babbage-001. For example, you might find yourself reading a string like this,

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat

which you couldn’t do much with on its own. It would also be potentially problematic to call decode on such a partial byte buffer, because the data is in UTF8 and one byte might not equal one character, so you should refrain from stringifying the data until you actually have a full line.

That means a robust parser will have to read the incoming bytes and buffer that raw data until it encounters the delimiter, which follows every dataset, including the “[DONE]” string at the end. Once you find a delimiter, you can take what you’ve buffered so far, trim the "data: " prefix, and handle the actual object. I would also recommend not using regex to parse the data, but actually converting it using JSON. That way you will get errors when something unexpected happens, like invalid JSON being returned, and you will also not have to worry about the data including quotation, which could cause bugs with the code you showed here.

The kind of implementation I’m describing should cover all cases and provide reliable parsing. At least I have not encountered any issues with it yet across the different APIs and various models. Here’s what such an implementation might roughly look like.

let buffer = new Uint8Array(512);
let bufferIdx = 0;

while (true)
{
	const { done, value } = await reader.read();
	if (done)
		break;

	for (let i = 0; i < value.byteLength; ++i)
	{
		// Write to the buffer until we reach a double new-line
		// delimiter
		buffer[bufferIdx++] = value[i];

		if (bufferIdx >= 2 && value[i] == 10 && buffer[bufferIdx - 2] == 10)
		{
			// Handle one data object
			const lineBuffer = buffer.subarray(0, bufferIdx - 2);
			const line = decoder.decode(lineBuffer);

			// Each line starts with a "data: " prefix, followed by
			// the actual data, which is usually a JSON object
			if (line.indexOf('data: ') !== 0)
				throw new Error('Expected "data:" prefix in: ' + line);

			// Trim the "data: " prefix
			const dataStr = line.substring(6);

			// Stop if we reached the end of the stream
			if (dataStr === '[DONE]')
				break;

			// Parse and handle data
			const dataObj = JSON.parse(dataStr);

			// Handle data...

			// Reset buffer and continue reading
			bufferIdx = 0;
		}
	}
}

Topic		Replies	Views
High rate of invalid JSON response when streaming response API api	8	4272	August 14, 2023
GPT-4 model, unexpected returns in stream mode API gpt-4 , api	10	3197	December 16, 2023
Stuck on getting an error at the end of a streamed answer API chat-completion , streaming	3	1427	March 22, 2024
Dropped tokens when streaming via PHP? API	8	1291	December 18, 2023
Was there an intentional change to the streaming responses? (multiple chunks in stream event) API bug , api , streaming	9	2254	July 23, 2024

Stream response from `/v1/chat/completions` endpoint is missing the first token

Related topics