Stream response from `/v1/chat/completions` endpoint is missing the first token

This is partially off-topic, but I stumbled upon this thread by chance and noticed some potential problems with your code that I thought to point out since there’s so little information about this and I was just confronted with writing my own parser as well. This is in the interest of preventing any future problems for you and others who might find themselves in our situation.

What’s returned by a read call appears to be binary data fresh from the network. Usually it’s one or multiple objects as a string, each prefixed with a "data: " string and separated by a double line-feed “\n\n”. However, it’s also possible to receive partial data, which I encountered with the weaker models in particular, like text-babbage-001. For example, you might find yourself reading a string like this,

data: {"id":"chatcmpl-7CUO9LG02qY3GbqIwH6buRz69jUrZ","object":"chat

which you couldn’t do much with on its own. It would also be potentially problematic to call decode on such a partial byte buffer, because the data is in UTF8 and one byte might not equal one character, so you should refrain from stringifying the data until you actually have a full line.

That means a robust parser will have to read the incoming bytes and buffer that raw data until it encounters the delimiter, which follows every dataset, including the “[DONE]” string at the end. Once you find a delimiter, you can take what you’ve buffered so far, trim the "data: " prefix, and handle the actual object. I would also recommend not using regex to parse the data, but actually converting it using JSON. That way you will get errors when something unexpected happens, like invalid JSON being returned, and you will also not have to worry about the data including quotation, which could cause bugs with the code you showed here.

The kind of implementation I’m describing should cover all cases and provide reliable parsing. At least I have not encountered any issues with it yet across the different APIs and various models. Here’s what such an implementation might roughly look like.

let buffer = new Uint8Array(512);
let bufferIdx = 0;

while (true)
{
	const { done, value } = await reader.read();
	if (done)
		break;

	for (let i = 0; i < value.byteLength; ++i)
	{
		// Write to the buffer until we reach a double new-line
		// delimiter
		buffer[bufferIdx++] = value[i];

		if (bufferIdx >= 2 && value[i] == 10 && buffer[bufferIdx - 2] == 10)
		{
			// Handle one data object
			const lineBuffer = buffer.subarray(0, bufferIdx - 2);
			const line = decoder.decode(lineBuffer);

			// Each line starts with a "data: " prefix, followed by
			// the actual data, which is usually a JSON object
			if (line.indexOf('data: ') !== 0)
				throw new Error('Expected "data:" prefix in: ' + line);

			// Trim the "data: " prefix
			const dataStr = line.substring(6);

			// Stop if we reached the end of the stream
			if (dataStr === '[DONE]')
				break;

			// Parse and handle data
			const dataObj = JSON.parse(dataStr);

			// Handle data...

			// Reset buffer and continue reading
			bufferIdx = 0;
		}
	}
}
1 Like