4o and 4 API output has typo/missing words

chinmay1 · July 18, 2024, 1:47am

I think my engineering team changed code at two places.
One of them is posted above. I wil talk to them in few hours (based in India) and put the summary here for what to change. We use Node. So if you are using Python then use Claude to explain.

anon10827405 · July 18, 2024, 2:53am

If you are passing your stream to a user in the browser you can take advantage of the Streams API. I haven’t noticed any issues and haven’t made any changes for my streaming from Deno → ReactJS for months.

This will provide a robust, performant framework that does what you’re looking for and avoid issues.

chinmay1 · July 18, 2024, 3:06am

Out user hear it instead of reading. So the stream needs to be converted into voice.

anon10827405 · July 18, 2024, 3:20am

The Streams API was directed towards @toby2

It does work with any sort of format though - not limited to text. I’d recommend it always over using for await (const part of stream) if you want to add some complexity into your streams.

@chinmay1
There’s a lot of external parts happening in your code that makes it hard to follow. I would recommend

This exactly. Strip out the complexities.

toby2 · July 18, 2024, 8:57am

awesome thank you guys
i ap[preciate your support and guidance!

vfssantos1 · July 18, 2024, 1:40pm

If anyone’s doing it via a Javascript runtime (node.js, deno, bun, browser, etc), here’s our custom implementation using Javascript Fetch API for the communication with the streaming API:
The inner function receives a ‘stream’ callback that will soft the burden on streaming the response into the caller function. Also, you could just ‘await’ for the response instead.

import formatChat, { tokenCounter } from '../helpers.js';

export default ({ config, env, ...rest }) => {
  return async (chat, stream = () => { }) => {
    const messages = formatChat({ ...chat, config });

    // call the openai api
    const response = await fetch("https://api.openai.com/v1/chat/completions", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${config.apiKey}`,
      },
      body: JSON.stringify({
        messages,
        model: config.model || "gpt-3.5-turbo",
        stream: true,
        temperature: config.temperature || 0,
        response_format: config.responseType === "json"
          ? { type: "json_object" }
          : undefined,
      }),
    });

    if (!response.ok) {
      const errorText = await response.text();
      console.error("API response error:", errorText);
      throw new Error(errorText);
    }

    // create a reader to read the stream
    const reader = response.body?.getReader();
    const decoder = new TextDecoder("utf-8");

    let text = "";
    let resolver;
    let bufferedData = "";

    const resolved = new Promise((resolve, reject) => {
      resolver = { resolve, reject };
    });

    async function processText({ done, value }) {

      if (done) {

        // Process any remaining buffered data
        if (bufferedData) {
          try {
            const lines = bufferedData.split("\n");
            lines.forEach((line) => {
              if (line.startsWith("data:")) {
                const data = JSON.parse(line.slice(5).trim());

                const delta = data?.choices?.[0]?.delta?.content;
                if (delta) {
                  stream(delta);
                  text += delta;
                }
              }
            });
          } catch (e) {
            console.error("Failed to parse remaining buffered data:", e, "Buffered data:", bufferedData);
          }
        }

        resolver.resolve({ prompt: messages, answer: text, tokens: tokenCounter(messages, text) });
        return;
      }

      const chunk = decoder.decode(value, { stream: true });

      bufferedData += chunk;
      const lines = bufferedData.split("\n");

      // Keep the last incomplete line in the buffer
      bufferedData = lines.pop();

      lines.forEach((line) => {
        if (line.startsWith("data:")) {
          let data;
          try {
            data = JSON.parse(line.slice(5).trim());
          } catch (e) {
            !line?.slice(5)?.trim()?.startsWith('[DONE]') &&
              console.error("Failed to parse JSON:", e, "Line:", line);
            return;
          }

          const delta = data?.choices?.[0]?.delta?.content;
          if (delta) {
            stream(delta);
            text += delta;
          } 
        }
      });

      return reader.read().then(processText).catch((error) => {
        console.error("Error reading stream:", error);
        resolver.reject(error);
      });
    }

    reader.read().then(processText).catch((error) => {
      console.error("Error starting stream read:", error);
      resolver.reject(error);
    });

    return await resolved;
  };
};

vfssantos1 · July 18, 2024, 3:31pm

In JS environment, if calling the API directly (instead of using the openai lib) with the stream option activated, the exact same logic would directly lead to the error mentioned in this thread.

That is because sometimes the chunk in

delta = chunk.choices[0].delta if chunk.choices and chunk.choices[0].delta is not None else None

could be an JSON object cut in the middle, and therefore, not parseable. the remainder of the object would come in the next chunk.

So, if you try to parse the object from either of these partial chunks, you’d get an error, and move on to the next chunk, which would cause you to effectively loose a token.

A possible solution (which I provided a suggestion implementation below) would be to accumulate chunks when there’s an error when trying to parse them. This fixed this problem in every test we’ve performed so far.

chinmay1 · July 18, 2024, 3:52pm

That’s what we did to fix the problem. I am still disappointed in OpenAI that they did not come clean on what cause spike in such JSON splitting.

anon10827405 · July 18, 2024, 4:19pm

If I understand correctly you were all expecting full JSON objects in each stream chunk?

vfssantos1 · July 18, 2024, 4:46pm

Short answer: yes.

Longer answer:
in my experience, calling the API directly with the stream option activated, each stream chunk, when decoded, would be comprised of many different “objects”, separated by new lines (\n) in the response.
Now, each object has the following format:

data: {"id":"chatcmpl-9m4NaH69zPfngWOXl5vKYa40OK6OR","object":"chat.completion.chunk","created":1721243606,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_c4e5b6fa31","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

So that a single chunk containing a “Hello World” text would be received as following:

data: {"id":"chatcmpl-9m4NaH69zPfngWOXl5vKYa40OK6OR","object":"chat.completion.chunk","created":1721243606,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_c4e5b6fas1","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-9m4NaH69zPfngWOXl5vKYa40OK6OR","object":"chat.completion.chunk","created":1721243606,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_c4e5b6fas1","choices":[{"index":0,"delta":{"content":" World"},"logprobs":null,"finish_reason":null}]}

That is because you break the stream chunk into lines, parse the object in each line, and get each token as the delta param as:
chunk.choices[0].delta.content

Now, what started happening two days ago was that we started receiving chunks as:

data: {"id":"chatcmpl-9m4NaH69zPfngWOXl5vKYa40OK6OR","object":"chat.completion.chunk","created":1721243606,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_c4e5b6fas1","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-9m4NaH69zPfngWOXl5vKYa40OK6OR","object":"chat.completion.chunk","created":1721243606,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_c4e5b6fas1","choices":[{"index":0,"delta":{"content":" W

And then, the following chunk would be like:

data: orld"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-9m4NaH69zPfngWOXl5vKYa40OK6OR","object":"chat.completion.chunk","created":1721243606,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_c4e5b6fas1","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]}

For printing a “Hello World!” output.

So when I say that we were expecting full json objects as a response in the stream, is not the final JSON object I’d expect with the json mode activated, but rather a stream chunk where each object would be parsed to extract the next token in the stream.

anon10827405 · July 18, 2024, 4:58pm

Hmmm… This has not being going on in my experience, or atleast is somehow being managed by the Streams API. If this was the case a bunch of my web apps would not be working at all.

Here is my code on Deno:

    const transformer = new TransformStream({
      transform(chunk, controller) {
        // Modify the chunk
        const decoded = new TextDecoder().decode(chunk);
        const parsed = JSON.parse(decoded);
        const text = parsed.choices?.[0].delta?.content;
        if (text) {
          const encoded = new TextEncoder().encode(text);
          controller.enqueue(encoded);
        }
        if (!text && parsed.choices[0].finish_reason) {
          controller.enqueue(null);
        }
      },
    });

    const completion = await createChatCompletion(
      openai,
      markdown,
      questionData.text,
      prompts.prompt,
    );

    if (!completion) {
      console.error("Failed to create completion");
      return statusResponse("Our AI failed to respond!", 503);
    }

    // Pipe the original stream through the transformer
    const modifiedStream = completion.toReadableStream().pipeThrough(
      transformer,
    );

 return new Response(modifiedStream, {
      headers: { ...corsHeaders, "Content-Type": "text/event-stream" },
      status: 200,
    });

And then the client side

const readStream = async () => {
        if (!reader || !id) return;

        let localBuffer = "";
        try {
            while (true) {
                const { done, value } = await reader.read();
                if (done) break;

                const text = new TextDecoder().decode(value, { stream: true });
                localBuffer += text;
                setState(prev => ({
                    ...prev,
                    buffer: localBuffer
                }));
            }
            updateGlobalState(id, localBuffer); // Update global state once the stream is complete to prevent rapid rendering
        } catch (error) {
            //console.error('Stream reading error:', error);
        } finally {
            if (reader.releaseLock) {
                reader.releaseLock();
            }
        }
    }

If this was happening to me this code would not be working.

Although catching the error and then having a buffer works, I’d consider it a band-aid solution to a greater problem.

vfssantos1 · July 18, 2024, 5:23pm

So this is wierd…
We’re also using Deno in the backend. But instead of using the pipeThrough to stream as a Reseponse Object into the front-end, there’s some processing that happens in the backend before sending the data to the front-end.
So our code in backend for handling the openAi response is very similar to what you shared of your front-end react code in the readStream function, so if you’d put a console.log in the ‘text’ variable right after you declare it, like

const text = new TextDecoder().decode(value, { stream: true });
console.log(text)

it’d lead to the issue I described earlier.

Agree completely. But also, I cannot find any other reason for having this issue here, as I’m literally logging the chunk straight from the API response and observing this behavior; so we’re left with this workaround. I’ve also tried to completely isolate this code out of the application to check if the application itself could be causing the problem, but apparently not. Still had the same issue.

Wondering if it is a versioning with Deno now… Im at 1.44.4.

anon10827405 · July 18, 2024, 5:28pm

Maybe

I do process it slightly in the back-end by parsing the object and then only passing the text content to the front-end.

Huh. That’s really strange.

vfssantos1 · July 18, 2024, 6:15pm

Yeah, I’ll investigate further, and post it back here if I find out anything new.

But big thank you for sharing your code samples and helping to figure this out!

PaulBellow · July 18, 2024, 10:13pm

matt-emergehaus · July 19, 2024, 9:41pm

I would recommend sending this thread to your developer.

Topic		Replies	Views
Streaming events returned bunched up Bugs	8	204	July 19, 2024
Chatgpt api (openai-node v4.26.0) stream issue with gpt-4 models Bugs	18	1422	February 15, 2024
[Realtime API] Audio is randomly cutting off at the end Bugs realtime	80	4548	May 3, 2025
Request failed with status code 400 API	43	58248	January 29, 2024
Streaming Responses - Exploring Cost-Efficient Alternatives to SSE with AWS Lambda & API Gateway API api	17	15229	February 29, 2024

4o and 4 API output has typo/missing words

Related topics