ChatGPT failed to calculate 241-(-241)+1.

Math is a known weak point of LLMs. A simple rule is not using the LLM (ChatGPT) for something that can be achieved perfectly with a standard calculator/computer.

That said, I think this â€śmath paradox,â€ť where our seemingly most advanced computer cannot perform arithmetic, does a great job demonstrating the paradigm shift brought to us by AI.

In short, our past computers were all like â€śmathâ€ť people. But now, â€ścreativesâ€ť have joined the party.

Below is an excerpt from a blog post I wrote yesterday that tries to address this:

while ChatGPT may appear capable of solving simple arithmetic and some algebraic equations, it regularly fails with more complex math problems.

This inability to solve math problems may seem paradoxical because our calculators and computers have been successfully helping us solve math problems for over 50 years. So why canâ€™t AI, running on a computer, solve the same equations?

This paradox is an excellent opportunity to recognize the fundamental shift in how the technology underlying our interactions with computers has changed. For example, itâ€™s common for humans to refer to others based on their abilities, like being a â€śmath personâ€ť or a â€ścreative.â€ť Each designation comes loaded with expectations we have of those types of people.

Until now, computers have always been the â€śmath personâ€ť who could be relied on for calculations, but not the â€ścreativeâ€ť you would go to for works of art. Technology like ChatGPT is our first glimpse at what it might look like if computers played the role of â€ścreative.â€ť

To understand ChatGPTâ€™s math capabilities, imagine you need a math tutor. But for the sake of this discussion, you decide to go to the â€ścreativeâ€ť person for help. Assuming they know the math required, the â€ścreativeâ€ť will have particular strengths in helping you, like communicating the algorithm in easy-to-understand language instead of mathematical notation. But coming up with an exact answer is still something they rely on a calculator to complete.

Suppose you need to use ChatGPT for math assistance. In that case, ask for instructions on solving a math question rather than relying solely on its final answer. For example, I have witnessed ChatGPT providing the correct formula while giving a wrong answer after the equals sign. So, itâ€™s best to use ChatGPT for math assistance with some caution. And always use a traditional calculator to check your final answer!

Itâ€™s a language model, and is trained on language. Donâ€™t trust GPT to do your math homework.

You can modify your statement to be more generally completely true.

####
*Itâ€™s a language model, and is trained on language. Donâ€™t trust GPT to do anything technical. Verify everything!*

This is one of the main â€śproblemsâ€ť we see with our welcome users posting here; they do not understand the difference between a generative AI and an expert system AI.

Also, while OpenAI has been discussing their AGI goals lately, itâ€™s easy to see that GPTs are not going to be the â€śwhole solutionâ€ť toward AGI goals, but might play a role in the â€śgenerative componentâ€ť someday.

Personally, I think it is best to remain highly skeptical of AGI, because as GPT readily shows us, generating text based on a predictive large language model does not create reliable technical responses which are â€śintelligentâ€ť. It simply auto-completes â€śthe best it canâ€ť and that is not â€śintelligenceâ€ť by a long shot.

Having said that, I find GPTs very helpful and great assistants in my daily work flow; but I do not use GPTs for math. I use a calculator.

HTH

Hi @daisyyedda

Further to my earlier post about ChatGPT being a large language model, I ran an experiment for you last night:

- Fine tuned a base
`OpenAI davinci`

model using`16 n_epochs`

with the following JSON training data:

```
{"prompt":"241 - (-241) + 1 ++++", "completion":" 483 ####"}
```

This fine-tuning baked and was ready to test when I was having morning coffee (just now):

So, ran a completion using this model, as follows, and got the correct â€śanswerâ€ť you are looking for.

This is because ChatGPT and the GPT APIs in general are simply â€śfancyâ€ť auto-completion engines based on a large language prediction model. So, I fine-tuned the model to increase the probability that when the `davinci`

model sees the exact text you were interesting in:

```
241 - (-241) + 1
```

The model will predict that `483`

is the next text in the sequence of natural language.

Note, my new fine-tuned model is not doing any math! The model is also not â€ścalculatingâ€ť at all.

Itâ€™s simply predicting text.

To prove this, I will slightly alter the prompt to:

```
242 - (-242) + 1
```

Now we re-run the completions, and itâ€™s not surprising, here is what we get:

This is because we cannot train these GPT models to perform math, we can only train them to predict text, given prior test.

In the experimental case above, the model simply predicts the next sequence of text as `483`

because it was trained to predict this and this is the closest match, statistically speaking.

There is no â€ścalculationâ€ť happening at all, @daisyyedda

ChatGPT cannot make a â€śmath calculation mistakeâ€ť because itâ€™s not performing math calculations, it is just predicting the next text in a sequence based on a language model.

Hope this helps you understand what GPT is and does.

See Also:

Hi @ruby_coder

Thank you for running the simulation.

Let me give the complete story here: After ChatGPT gave me 484, I corrected its response and received its acknowledgement on 483 being the right answer. However, when I asked the exact same question again, it still predicted 484.

As you have mentioned that the model is predicting the next text in the sequence of natural language, in this case, wouldnâ€™t 483 become the new prediction?

Any further explanations would be greatly appreciated.

That is because the model is pre-trained and when you interact with it, itâ€™s not learning.

Does that make sense to you @daisyyedda ?