GPT thinks math answers are wrong, even when it says they are right

I’m creating a trivia app. For some reason, the bot is having a hard time with math (but not the calculations). Yes, I know LLMs have an issue with math. However, gpt will literally respond like this.

“Oops! 8-4 doesn’t equal 4. The correct answer is 4!”

Does anyone have a suggestion for why this is happening? My best guess is that the zod format schema is “working” but the gpt splits up it’s answer and understanding of the context within the object output itself. If that is the case, is there a workaround?

This is the Schema:

const TriviaFormat = z.object({
			was_answer_correct: z.boolean(),
			fun_fact_or_critique: z.string(),
			next_question_to_ask: z.string()
		});

This is the bot setup:

const botResponse = await openai.beta.chat.completions.parse({
			model: 'gpt-4o-mini',
			messages,
			max_tokens: 2000,
			top_p: 0.125,
			temperature: 0.125,
			response_format: zodResponseFormat(TriviaFormat, 'event')
		});

6 days and no reply. Bumping

You have your choice:

  • quality of AI model
  • quality of AI model input

You already sabotaged yourself by using gpt-4o-mini, instead of a knowledgeable model such as gpt-4 or gpt-4-turbo.

Using structured outputs would not be my choice for obtaining the highest-quality answer. Especially when the first token you are asking the AI to produce is “true” with no previous analysis of the situation.