Constant Mathematical Errors

Dent · January 13, 2023, 6:01pm

The OpenAI cookbook has some good tips on getting more reliable solutions. For multi-step math problems, prepend the response with “Let’s think step by step:” and accuracy scores shoot up from 18% to a decent 79%!

github.com

openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md#techniques-to-improve-reliability

# Techniques to improve reliability

When GPT-3 fails on a task, what should you do?

- Search for a better prompt that elicits more reliable answers?
- Invest in thousands of examples to fine-tune a custom model?
- Assume the model is incapable of the task, and move on?

There is no simple answer - it depends. However, if your task involves logical reasoning or complexity, consider trying the techniques in this article to build more reliable, high-performing prompts.

## Why GPT-3 fails on complex tasks

If you were asked to multiply 13 by 17, would the answer pop immediately into your mind? For most of us, probably not. Yet, that doesn't mean humans are incapable of two-digit multiplication. With a few seconds, and some pen and paper, it's not too taxing to work out that 13 x 17 = 130 + 70 + 21 = 221.

Similarly, if you give GPT-3 a task that's too complex to do in the time it takes to calculate its next token, it may confabulate an incorrect guess. Yet, akin to humans, that doesn't necessarily mean the model is incapable of the task. With some time and space to reason things out, the model still may be able to answer reliably.

As an example, if you ask `text-davinci-002` the following math problem about juggling balls, it answers incorrectly:

```text-davinci-002
Q: A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there?

This file has been truncated. show original

Topic		Replies	Views
Bug in tackling mathematical questions API	3	643	December 27, 2023
We have a problem with basic maths API	7	2457	December 27, 2023
Chat GPT failing on math reasoning task API	4	2399	February 10, 2024
An amusing new twist on GPT-3's ability to do arithmetic API	4	1346	April 5, 2022
My experience: ChatGPT really sucks at math Community gpt-4 , chatgpt	6	5031	September 7, 2024

Constant Mathematical Errors

Related topics