ChatGPT simple math calculation mistake

ChatGPT failed to calculate 241-(-241)+1.
Screen Shot 2023-02-15 at 3.00.27 PM

2 Likes

Math is a known weak point of LLMs. A simple rule is not using the LLM (ChatGPT) for something that can be achieved perfectly with a standard calculator/computer.

That said, I think this “math paradox,” where our seemingly most advanced computer cannot perform arithmetic, does a great job demonstrating the paradigm shift brought to us by AI.

In short, our past computers were all like “math” people. But now, “creatives” have joined the party.

Below is an excerpt from a blog post I wrote yesterday that tries to address this:

while ChatGPT may appear capable of solving simple arithmetic and some algebraic equations, it regularly fails with more complex math problems.

This inability to solve math problems may seem paradoxical because our calculators and computers have been successfully helping us solve math problems for over 50 years. So why can’t AI, running on a computer, solve the same equations?

This paradox is an excellent opportunity to recognize the fundamental shift in how the technology underlying our interactions with computers has changed. For example, it’s common for humans to refer to others based on their abilities, like being a “math person” or a “creative.” Each designation comes loaded with expectations we have of those types of people.

Until now, computers have always been the “math person” who could be relied on for calculations, but not the “creative” you would go to for works of art. Technology like ChatGPT is our first glimpse at what it might look like if computers played the role of “creative.”

To understand ChatGPT’s math capabilities, imagine you need a math tutor. But for the sake of this discussion, you decide to go to the “creative” person for help. Assuming they know the math required, the “creative” will have particular strengths in helping you, like communicating the algorithm in easy-to-understand language instead of mathematical notation. But coming up with an exact answer is still something they rely on a calculator to complete.

Suppose you need to use ChatGPT for math assistance. In that case, ask for instructions on solving a math question rather than relying solely on its final answer. For example, I have witnessed ChatGPT providing the correct formula while giving a wrong answer after the equals sign. So, it’s best to use ChatGPT for math assistance with some caution. And always use a traditional calculator to check your final answer!

2 Likes

It’s a language model, and is trained on language. Don’t trust GPT to do your math homework.

2 Likes

You can modify your statement to be more generally completely true.

It’s a language model, and is trained on language. Don’t trust GPT to do anything technical. Verify everything!

This is one of the main “problems” we see with our welcome users posting here; they do not understand the difference between a generative AI and an expert system AI.

Also, while OpenAI has been discussing their AGI goals lately, it’s easy to see that GPTs are not going to be the “whole solution” toward AGI goals, but might play a role in the “generative component” someday.

Personally, I think it is best to remain highly skeptical of AGI, because as GPT readily shows us, generating text based on a predictive large language model does not create reliable technical responses which are “intelligent”. It simply auto-completes “the best it can” and that is not “intelligence” by a long shot.

Having said that, I find GPTs very helpful and great assistants in my daily work flow; but I do not use GPTs for math. I use a calculator.

:slight_smile:

HTH

1 Like

Hi @daisyyedda

Further to my earlier post about ChatGPT being a large language model, I ran an experiment for you last night:

  • Fine tuned a base OpenAI davinci model using 16 n_epochs with the following JSON training data:
{"prompt":"241 - (-241) + 1 ++++", "completion":" 483 ####"}

This fine-tuning baked and was ready to test when I was having morning coffee (just now):

So, ran a completion using this model, as follows, and got the correct “answer” you are looking for.

This is because ChatGPT and the GPT APIs in general are simply “fancy” auto-completion engines based on a large language prediction model. So, I fine-tuned the model to increase the probability that when the davinci model sees the exact text you were interesting in:

241 - (-241) + 1

The model will predict that 483 is the next text in the sequence of natural language.

Note, my new fine-tuned model is not doing any math! The model is also not “calculating” at all.

It’s simply predicting text.

To prove this, I will slightly alter the prompt to:

242 - (-242) + 1

Now we re-run the completions, and it’s not surprising, here is what we get:

This is because we cannot train these GPT models to perform math, we can only train them to predict text, given prior test.

In the experimental case above, the model simply predicts the next sequence of text as 483 because it was trained to predict this and this is the closest match, statistically speaking.

There is no “calculation” happening at all, @daisyyedda

ChatGPT cannot make a “math calculation mistake” because it’s not performing math calculations, it is just predicting the next text in a sequence based on a language model.

Hope this helps you understand what GPT is and does.

:slight_smile:

See Also:

Hi @ruby_coder

Thank you for running the simulation.

Let me give the complete story here: After ChatGPT gave me 484, I corrected its response and received its acknowledgement on 483 being the right answer. However, when I asked the exact same question again, it still predicted 484.

As you have mentioned that the model is predicting the next text in the sequence of natural language, in this case, wouldn’t 483 become the new prediction?

Any further explanations would be greatly appreciated.

That is because the model is pre-trained and when you interact with it, it’s not learning.

Does that make sense to you @daisyyedda ?

Made sense to me. Thanks @ruby_coder

1 Like

OpenAI has been wrong twice on simple addition problems. See below

I asked “Total Raw Cost = $549.72 + $6.98 + $41.00 + $35.00 + $552.00 + $76.16 + $29.12”

Open AI Answer “Total Raw Cost = $1,290.98”

Correct Answer is $1,289.98

I understand it’s very simple and I could do on my own, but that’s also part of the issue, it’s VERY simple math and Open AI is getting it wrong.

see screenshots to see actual chat.

1 Like

Yes. As this thread, and many other threads, and the disclaimers in the documentation all state, these are stochastic language parrots, not infallible computing machines.
Don’t use it for math. Or anything else that requires actual certainty.

3 Likes

Bonjour,
j’ai soucis, je n’arrive pas à bien calculé avec ChatGPT
Exemple chatGPT trouve ça pour ce calcul 12 + 196 + 138 + 206 + 122 + 307 + 35 + 135 + 20 + 201 + 4 + 3 + 2 + 1 + 14 + 21 + 211 + 149 + 244 + 134 + 280 + 33 + 127 + 27 + 196 = 3666
et sur excel j’ai trouvé 2818