4o and 4 API output has typo/missing words

matt-emergehaus · July 17, 2024, 8:32pm

Just came to the forum to find this post. Also experiencing the same issue. Our partial parsing solution worked until today. Haven’t deployed anything to the system in question since the last time it worked, so the change is definitely on OpenAI’s end.

chinmay1 · July 17, 2024, 8:34pm

We are seeing this issue left and right too. The JSON is not coming properly. We are still trying to fix the code. Will keep you posted.

chinmay1 · July 17, 2024, 8:37pm

BTW, I created a ticket on this last night. For a canned response from Helpdesk. They have still not acknowledged the problem. Not sure how to bring it to OpenAI admin notice.

matt-emergehaus · July 17, 2024, 8:46pm

I also created a ticket. That’s probably the best we can do to get OpenAI’s attention. If you’re running into this issue, click Help in the dashboard and report an API bug with as much detail as you can.

chinmay1 · July 17, 2024, 8:48pm

Thats what I did last night and again this morning as I got response from them. In the past they were very responsive. Not so much anymore. Also, I am getting a lot of “Error” emails from OpenAi today. But they dont explain what those Error are. So maybe they know the issue and are fixing them.

matt-emergehaus · July 17, 2024, 9:16pm

Ah, this turned out to be the issue for us. We built our project before the OpenAI Node client lib handled streaming calls, and there turned out to be an error in our partial JSON parsing logic that wasn’t revealed until OpenAI made a change on their end in the last couple of days.

In our case, I wasn’t emptying the buffer object after a successful parse. Once I made that change, I stopped seeing dropped characters. I remember having found some Python code deep in some old docs that I had (obviously incorrectly) rewritten into JS to handle this originally.

So the solution would probably be: inspect your streaming call code, use an official lib, or check out the Vercel AI SDK if you’re using JS/TS. Those options will handle the streaming responses for you.

PaulBellow · July 17, 2024, 9:17pm

Thanks to everyone coming back to share workarounds.

Trying to find out if anything changed, but we may never know.

Appreciate the attempts to help each other, though. That’s what makes this place so special.

matt-emergehaus · July 17, 2024, 9:24pm

Actually, I spoke too soon! Still seeing this issue, even after fixing the partial JSON parsing. Going to try switching to one of the official client libs to see if that sorts us out.

chinmay1 · July 17, 2024, 9:37pm

Here is what my engineering team is trying:

const OpenAI = require("openai");
const openai = new OpenAI({
  apiKey: <OPENAI_KEY>,
});
 
async function init() {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    temperature: 0.8,
    messages: [
      {
        role: "system",
        content: `User will provide a topic and you need to create a sales Pitch`,
      },
      {
        role: "user",
        content: `Provide a sales pitch for google search engine`,
      },
    ],
    stream: true,
  });
  let dataAns = "";
  for await (const part of stream) {
    if (part.choices[0]?.delta?.content) {
      dataAns = `${dataAns}${part.choices[0]?.delta?.content}`;
      const regex = /(.*[\.\?])$/gm;
      let found = dataAns.trim().match(regex);
      if (found) {
        dataAns = "";
        console.log("found", found);
        const textMessage = {
          text: found[0],
        };
      }
    }
    if (part?.choices[0].finish_reason === "stop") {
      console.info("finished the stream now ending the connection");
      console.log("dataAns", dataAns);
    }
  }
}
init();

anon10827405 · July 17, 2024, 9:38pm

Seeing how there aren’t thousands of similar issues I’d wager that there are some edge cases here.

Were and have you guys been expecting single tokens before or batches of variable sizes?
Are you using unofficial or outdated libraries

anon10827405 · July 17, 2024, 11:34pm

This code does not look healthy.

It seems like you’re trying to capture full sentences?

I’m on my phone but I believe I see an issue.

You are using regex… For some reason to capture a sentence by a full stop or question mark.

Then, you are naively taking all the content from BEFORE (inclusive), and discarding any additional tokens that were after. Causing words to be lost

chinmay1 · July 17, 2024, 11:37pm

This was a WIP code I posted and it had that error. But the original error is not this. The original error is that we are not getting proper JSON response. Instead of getting one full JSON response, it’s being spilt into two.

chinmay1 · July 17, 2024, 11:38pm

Here is the revised code

const response: any = await openai.chat.completions.create({
  model: "gpt-4o",
  temperature,
  messages: messageList,
  stream: true,
});

let finalString = '';
let lastProcessedIndex = 0;
let fullStop = false;
let endCall = false;

for await (const chunk of response) {
  if (chunk.choices[0]?.delta?.content) {
    finalString += chunk.choices[0].delta.content;
    
    const regex = /.*?[.!?](?:\s|$)/g;
    let match;
    
    while ((match = regex.exec(finalString.slice(lastProcessedIndex))) !== null) {
      const sentence = match[0];
      const sentenceEndIndex = lastProcessedIndex + match.index + sentence.length;
      
      console.log('Sentence found:', sentence.trim());
      
      const { text, containsPattern } = this.checkEndCall(sentence);
      
      if (containsPattern) {
        console.log('containsPattern:', containsPattern);
        automotiveCache[chatObject.userId]["data"].push(text.trim());
        endCall = true;
      } else {
        automotiveCache[chatObject.userId]["data"].push(sentence.trim());
      }
      
      lastProcessedIndex = sentenceEndIndex;
      fullStop = true;
    }
  }
  
  if (chunk?.choices[0].finish_reason === "stop") {
    console.info("Finished the stream, ending the connection");
    
    if (!fullStop && lastProcessedIndex < finalString.length) {
      const remainingText = finalString.slice(lastProcessedIndex);
      const { text, containsPattern } = this.checkEndCall(remainingText);
      
      automotiveCache[chatObject.userId]["data"].push(text.trim() + ".");
      
      if (containsPattern) {
        endCall = true;
      }
    }
    
    automotiveCache[chatObject.userId]["endCall"] = endCall;
    automotiveCache[chatObject.userId]["finish"] = true;
    
    const botResponse = { "sender": "bot", "content": finalString };
    
    if (automotiveConversationCache[chatObject.userId]?.convo?.messages?.length) {
      automotiveConversationCache[chatObject.userId].convo.messages.push(botResponse);
    }
    
    if (!automotiveConversationCache[chatObject.userId]) {
      const searchQuery2: any = { "accountId": chatObject.accountId, "userId": chatObject.userId, "status": 0 };
      const check1 = await ConversationAutomotiveModel.findOne(searchQuery2);
      const check2 = await ConversationAutomotiveModel.findOne(searchQuery);
      
      if (!check1 && !check2) {
        return { statusCode: 400, message: "Bad Request" };
      }
    }
    
    setTimeout(async () => {
      console.log('Saving bot response to db:', botResponse);
      await ConversationAutomotiveModel.findOneAndUpdate(searchQuery, { $push: { messages: botResponse }, $set: { modifiedOn: chatObject.modifiedOn } });
    }, 500);
    
    automotiveCache[chatObject.userId]['dataString'] = "";
  }
}

icdev2dev · July 18, 2024, 12:03am

FWIW it does not appear that anything is wrong from the API standpoint in my context in python.

client = OpenAI()

DEPLOYMENT_MODEL ="gpt-4o"

async def async_main():
    messages = [{"role": "system", "content": "User will provide a topic and you need to create a sales Pitch"},
                {"role": "user", "content": " Provide a sales pitch for google search engine "}
            ]
    
    stream =  client.chat.completions.create(messages=messages, model=DEPLOYMENT_MODEL, stream=True)

    full_delta_content = "" # Accumulator for delta content to process later

    # Process the stream response for tool calls and delta content
    for chunk in stream:
        delta = chunk.choices[0].delta if chunk.choices and chunk.choices[0].delta is not None else None

        if delta and delta.content:
            full_delta_content += delta.content
            
    print(full_delta_content)

def main():
    asyncio.run(async_main())

if __name__ == "__main__":
    main()

I would suggest starting with the basics

matt-emergehaus · July 18, 2024, 12:52am

After I got into the debugger, it turned out that the issue DID lie in our JSON parsing logic, still. Our apps using the latest official client libs weren’t affected. It was related to the issue @vfssantos posted about above.

For anybody else running into this, I’d suggest either upgrading to the latest version of the client lib, or running through your stream parsing logic in a debugger.

chinmay1 · July 18, 2024, 1:06am

But something changed in output from API. Earlier such bifurcation of JSON used to happen once in awhile but since yesterday it was with every sentence at the start. So when bot responds with 3 or 5 sentences the first couple would have this issue always and rest would be OK.

toby2 · July 18, 2024, 1:32am

Im new here so apologise in advance for sounding like a dummy.
I have a web app that uses gpt-4o api and like many of you here, im experiencing shocking responses where words and letters are missing.

Is someone able to help me understand, something like this, will OpenAi fix this or is this something I’ll need to get my backend developer help me with?

toby2 · July 18, 2024, 1:33am

import { Configuration, OpenAIApi } from "openai";
import dotenv from "dotenv";
import { IncomingMessage } from "http";
import { io } from "../index";
import { AnyArray } from "mongoose";
import Chat from "../models/chat";

dotenv.config();

/**
 * Takes in a prompt, base prompt, business prompt, and context, and returns a string.
 *
 * @param {string} prompt - The user's input prompt.
 * @param {string} basePrompt - The base prompt for the assistant
 * @param {string} businessPrompt - The business-specific information prompt from the user.
 * @param {any} context - The context for the conversation (chat history).
 * @returns {Promise<string>} - The response text generated by the OpenAI chat completions.
 */
const openAIService = async (
	room: string,
	chatId: string,
	prompt: string,
	basePrompt: string,
	businessPrompt: string,
	context: any,
) => {
	// Get the chat from the database
	const chat = await Chat.findById(chatId);
	if (!chat) {
		return;
	}

	console.log({ room, chatId, prompt, basePrompt, context });

	io.on("connection", (socket) => {
		socket.on("joinRoom", (chatId: string) => {
			socket.join(chatId);
			console.log(`User joined room ${chatId}`);
		});
	});

	const configuration = new Configuration({
		apiKey: process.env.OPENAI_API_KEY,
	});
	const openai = new OpenAIApi(configuration);

	const completion = await openai.createChatCompletion(
		{
			model: "gpt-4",
			messages: [
				{
					role: "system",
					content: `${basePrompt} \n\n Before we begin I would like to give you some background on my business: ${businessPrompt} \n\n Here are the previous chats: ${context}. \n\n Please respond using British English and all dollar references should be Australian Dollars not pounds.`,
				},
				{ role: "user", content: prompt },
			],
			// functions: [{ name: "get_response", parameters: schema }],
			// function_call: { name: "get_response" },
			temperature: 1, // Try configuring this differently.
			max_tokens: null,
			stream: true,
		},
		{ responseType: "stream" },
	);

	// const responseText: string = completion.choices[0].message.content;
	let response = "";

	const stream = completion.data as any;

	stream.on("data", (chunk: Buffer) => {
		const payloads = chunk.toString().split("\n\n");
		for (const payload of payloads) {
			const data = payload.split("data: ")[1];
			try {
				console.log(JSON.parse(data).choices[0].delta.content);
				const content = JSON.parse(data).choices[0].delta.content || " ";
				response += content;
				// console.log("Response:", response);
				console.log(content);
				// Emit each chunk of data as it arrives
				io.to(chatId).emit("openai_response", response);
			} catch (error) {
				console.error(`Error with JSON.parse and ${payload}.\n${error}`);
			}
		}
	});

	stream.on("end", () => {
		setTimeout(() => {
			console.log("Closing Stream");
			// Emit a custom event to signal the end of the stream
			io.to(chatId).emit("stream_end", response); // Send final response if needed
			//! Add the response to the database
			chat.messages.push({
				question: prompt,
				answer: response,
			});
			chat.save();
		}, 2500);
	});
};

export default openAIService;

PaulBellow · July 18, 2024, 1:45am

Welcome to the community.

I would recommend this. Maybe point them to this thread.

Make sure your OpenAI library is up to date, etc.

Basically, the way streams are “chunked” has changed. No word on whether it will change or what happened… yet. But it sounds like a lot of edge-cases as someone mentioned upthread… usually older libraries or hand-written streaming code…

toby2 · July 18, 2024, 1:46am

Im new here so apologise in advance for sounding like a dummy.
I have a web app that uses gpt-4o api and like many of you here, im experiencing shocking responses where words and letters are missing.

Is someone able to help me understand, something like this, will OpenAi fix this or is this something I’ll need to get my backend developer help me with?

Topic		Replies	Views
Streaming events returned bunched up Bugs	8	221	July 19, 2024
Chatgpt api (openai-node v4.26.0) stream issue with gpt-4 models Bugs	18	1449	February 15, 2024
Request failed with status code 400 API	43	59675	January 29, 2024
Streaming Responses - Exploring Cost-Efficient Alternatives to SSE with AWS Lambda & API Gateway API api	17	16784	February 29, 2024
Is anyone experiencing WebSocket Realtime Error on Chrome browser? API	77	1363	January 27, 2025

4o and 4 API output has typo/missing words

Related topics