Stream response from `/v1/chat/completions` endpoint is missing the first token

exectails · May 5, 2023, 1:29pm

This is partially off-topic, but I stumbled upon this thread by chance and noticed some potential problems with your code that I thought to point out since there’s so little information about this and I was just confronted with writing my own parser as well. This is in the interest of preventing any future problems for you and others who might find themselves in our situation.

What’s returned by a read call appears to be binary data fresh from the network. Usually it’s one or multiple objects as a string, each prefixed with a "data: " string and separated by a double line-feed “\n\n”. However, it’s also possible to receive partial data, which I encountered with the weaker models in particular, like text-babbage-001. For example, you might find yourself reading a string like this,

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat

which you couldn’t do much with on its own. It would also be potentially problematic to call decode on such a partial byte buffer, because the data is in UTF8 and one byte might not equal one character, so you should refrain from stringifying the data until you actually have a full line.

That means a robust parser will have to read the incoming bytes and buffer that raw data until it encounters the delimiter, which follows every dataset, including the “[DONE]” string at the end. Once you find a delimiter, you can take what you’ve buffered so far, trim the "data: " prefix, and handle the actual object. I would also recommend not using regex to parse the data, but actually converting it using JSON. That way you will get errors when something unexpected happens, like invalid JSON being returned, and you will also not have to worry about the data including quotation, which could cause bugs with the code you showed here.

The kind of implementation I’m describing should cover all cases and provide reliable parsing. At least I have not encountered any issues with it yet across the different APIs and various models. Here’s what such an implementation might roughly look like.

let buffer = new Uint8Array(512);
let bufferIdx = 0;

while (true)
{
	const { done, value } = await reader.read();
	if (done)
		break;

	for (let i = 0; i < value.byteLength; ++i)
	{
		// Write to the buffer until we reach a double new-line
		// delimiter
		buffer[bufferIdx++] = value[i];

		if (bufferIdx >= 2 && value[i] == 10 && buffer[bufferIdx - 2] == 10)
		{
			// Handle one data object
			const lineBuffer = buffer.subarray(0, bufferIdx - 2);
			const line = decoder.decode(lineBuffer);

			// Each line starts with a "data: " prefix, followed by
			// the actual data, which is usually a JSON object
			if (line.indexOf('data: ') !== 0)
				throw new Error('Expected "data:" prefix in: ' + line);

			// Trim the "data: " prefix
			const dataStr = line.substring(6);

			// Stop if we reached the end of the stream
			if (dataStr === '[DONE]')
				break;

			// Parse and handle data
			const dataObj = JSON.parse(dataStr);

			// Handle data...

			// Reset buffer and continue reading
			bufferIdx = 0;
		}
	}
}

Topic		Replies	Views
High rate of invalid JSON response when streaming response API api	8	4468	August 14, 2023
GPT-4 model, unexpected returns in stream mode API gpt-4 , api	10	3324	December 16, 2023
Stuck on getting an error at the end of a streamed answer API chat-completion , streaming	3	1704	March 22, 2024
Dropped tokens when streaming via PHP? API	8	1326	December 18, 2023
Was there an intentional change to the streaming responses? (multiple chunks in stream event) API bug , api , streaming	9	2661	July 23, 2024

Stream response from `/v1/chat/completions` endpoint is missing the first token

Related topics