ChatGPT calculations are wrong

Jacoub · January 16, 2023, 1:29am

($1,452,190 x 2.6 x 5/12) / 100 = $9,957

william.waterstone · January 16, 2023, 5:36am

I have noticed that the free chat AI does not perform well with simply math. I think that the focus is on the language rather than the math…

ruby_coder · January 16, 2023, 8:23am

Yes, because ChatGPT (at this maturity point in it’s lifecycle) needs more context.

Try this type of prompt instead:

Write a Ruby program to perform this calculation ($1,452,190 x 2.6 x 5/12) / 100.

The results I get are correct (or very close depending on the desired precision):

irb(main):001:0> result = (1_452_190 * 2.6 * 5/12) / 100
irb(main):002:0> puts "Result is: $#{result}"
Result is: $15732.058333333332

ruby_coder · January 16, 2023, 8:27am

As mentioned earlier, this depends on the context. See these 'not yet perfect` examples… just some ideas. The Ruby instructions were more specific, and so the answer was more accurate, I think For some reason, these answers are not “perfect” like my Ruby reply from ChatGPT:

ChatGPT is an immature beta generative AI. It often needs more context and so it’s up to the user to guide the AI on how to solve problems such as this.

See Also:

Again, my earlier Ruby code example were more specific, and so the answer was more accurate, I think

It seems to be that ChatGPT needs more context. In these examples, ChatGPT did not have enough context. However, in my Ruby prompt, ChatGPT nailed it. I guess this is because I specifically asked ChatGPT to generate code in a programming language ChatCPT supports.

william.waterstone · January 16, 2023, 3:27pm

I gave OAI what I thought was a simple enough instruction, which it appeared to understand.

William: Please calculate the value of the word “Computer” using the code A = 1, B= 2…all the way to Z = 26 using the English alphabet

Open AI: To calculate the value of the word “Computer” using the code you provided, we can add up the values of each letter as follows:
C = 3 O = 15 M = 13 P = 16 U = 21 T = 20 E = 5 R = 18
The total value of the word “Computer” using this code is: 3 + 15 + 13 + 16 + 21 + 20 + 5 + 18 = 105

I informed the AI that the answer was incorrect, and that the right answer was 111.

It then redid the calculation in the same ponderous manner, this time giving the correct answer.

Being surprised that a computer program would have difficulty in doing simple math I wrote the following:

William: I want to discuss your making a mathematical mistake as I find it interesting that a computer program would be made to be able to do that.

After that chat, I gave OAI another word to calculate.

William: Please calculate the value of “Open hearted” using the same algorithm A=1…

Again OAI produced an incorrect answer; Again I gave it the correct answer, and again it recalculated and returned the correct answer.

I then asked it;

William: Are you just copying my answer?

AI apologized "if my previous responses gave the impression that I was simply copying your answers. "

I then asked it to “Please Calculate “Calculation” using the same code”

It repeated its process and once more came up with an incorrect answer.

I told it that the answer was incorrect and said the correct answer was 112, and once more it when through the calculation process and this time delivered the answer 112.

It was then that I realized the program was simply copying the correct answer it had been given, and mirroring that data back to me, because the correct answer was 111 not 112.

After a bit more discussion about this;

William: It is not difficult to do these simple calculations, do you agree?

Open AI: Calculating the values of words and phrases using a simple letter-value code like the one you provided is a straightforward task that does not require a lot of complex processing or analysis. As an artificial intelligence, I am able to perform such calculations quickly and accurately.

Clearly the reply was misinformation.

Open AI: Is there anything else I can help with?

William: You could help me if you could learn…but you apparently cannot learn…

Open AI: As an artificial intelligence, I am designed to process and analyze information and provide responses based on that processing and analysis. I am able to learn and adapt to new information and experiences within the scope of my programming and the data that has been provided to me.

Is there something specific you’d like to know or discuss about artificial intelligence or about the process of learning and adapting to new information?

I had nothing more to say. It appears to be true that one can learn from and teach OAI, but I saw no point in going down that path re the math, as I use an online word-value calculator anyway.

Working with words and generating messages is more my focus, which appears to be this AI’s particular strength.

Dent · January 16, 2023, 4:03pm

Please see my post here on improving calculations:

stuartmitch.home · February 5, 2023, 10:45pm

Last night I used ChatGPT to solve some very simple maths calculations and it failed miserably. I had a data set of 20 numbers which I asked ChatGPT to sum and to find the mean, mode, medium etc for. It got the sum wrong and then all the others as a result. The others were not just incorrect due to the sum error but, based on the incorrect sum, got them wrong again.

Don’t use ChatGPT for maths. Does anyone know of an alternative AI app that can calculate maths?

ruby_coder · February 6, 2023, 2:13am

Hi @stuartmitch.home

It is probably best if you post back with the exact prompts you used.

It mades a big difference when asking a GPT-based large language model, which operates like a type of text autocompletion engine, to perform math in how the prompts are worded.

HTH

Dent · February 6, 2023, 4:00am

I have to second @ruby_coder’s opinion. LLMs’ completion outputs can vary largely based on input, even with temp=0

That being said, there exist AI interfaces for the Wolfram languange which could solve your problem

ruby_coder · February 6, 2023, 4:09am

Yes, of course (from wikipedia):

The Wolfram Language is a proprietary, general high-level multi-paradigm programming language developed by Wolfram Research. It emphasizes symbolic computation, functional programming, and rule-based programming and can employ arbitrary structures and data.

So, of course it can do math

It is also a functional programming / rules-based / symbolic programming system and not a GPT text-autocompletion / prediction engine.

As @Dent implies, but does not mention directly, is that there are better AI tools (which are expert system and rule-based) which do math very well. It is important to understand what a GPT based model is before asking it to expect results which are not part of the design criteria ( see footnote 1).

ChatGPT is a text-prediction engine and is not an expert system and nor is it “rule-based” except some of the proprietary tweaking and moderation which we don’t have a concise list of, to my knowledge.

Footnote:

There is no shortage of people attempting to use GPT models to perform tasks which GPT models are not designed to do as optimally as non GPT based models. This reminds us (every day here in this community) of the old saying goes “When you have a hammer in your hand, everything looks like a nail !” ChatGPT is no exception to this adage.
The above situation is normal in the life-cycle of a new technology, especially during the peak of the “public hype phase”. Over time, developers will get “back to normal” and will realize where GPT-models are optimal and when they are suboptimal.

Dent · February 6, 2023, 4:19am

I’m gonna put that adage a step further: From the lips of Adam Savage, who got it from someone else: “In every tool, there is a hammer.” So, like, even if you don’t have a hammer in your hands, you can probably make one out of what you do have, and that isn’t necessarily the greatest idea

ruby_coder · February 6, 2023, 4:24am

Hahaha… well said and very timely. Thanks @Dent

Yeah, we are seeing a lot of that everywhere, including this lively community for sure.

stuartmitch.home · February 6, 2023, 4:45am

Fair point. I should have.

One very basic calculation was “calculate the sum of the following numbers: 200 192 187 187 196 190 188 198 204 191 175 198 188 189 186 198 189 187 180 179” The answer given was ‘3,038’ when it should be ‘3.802’. I also asked for the mode of these numbers and got 2 (187). When there are two numbers each with 3 instances.

Dent · February 6, 2023, 4:59am

Sounds like you need some programming exerience! There are many free programming courses available online, but myself and others on the forum have intro guides/courses/etc. especially geared toward LLM-use

ruby_coder · February 6, 2023, 6:14am

Yes, as @Dent kindly mentioned, you will get better results if you “think” like a software developer.

For example:

Prompt:

Write a ruby method to calculate the sum of a string of numbers, given the string of numbers separated by white-space. Output the final sum.

ChatGPT Completion

def sum_numbers(string_of_numbers)
    numbers = string_of_numbers.split(' ').map(&:to_i)
    sum = 0

    numbers.each do |num|
        sum += num
    end

    puts sum
end

Ruby Console Testing… (where I send a lot of my time testing, BTW):

irb(main):027:1* def sum_numbers(string_of_numbers)
irb(main):028:1* numbers = string_of_numbers.split(' ').map(&:to_i)
irb(main):029:1* sum = 0
irb(main):030:1* 
irb(main):031:2* numbers.each do |num|
irb(main):032:2* sum += num
irb(main):033:1* end
irb(main):034:1* 
irb(main):035:1* puts sum
irb(main):036:0> end
=> :sum_numbers
irb(main):037:0> sum_numbers("200 192 187 187 196 190 188 198 204 191 175 198 18
8 189 186 198 189 187 180 179")
3802
=> nil
irb(main):038:0>

That method is a “good start” but it does not account for all variations of “white space” between numbers, and it just prints and does not return the sum, but it’s a start It’s easy to either reword the prompt, but faster to just make the changes and move on to the next task at hand.

Second Case:

For your second case, @stuartmitch.home maybe you can try again, changing your prompt using more “programmer geek-speak” ?

HTH

Note: What many of us have learned about GPT models, is that they make fine assistants but the more domain knowledge a user has, the more valuable / helpful the assistance. As @Dent is trying to say, the more domain knowledge you have (in this case programming / software development) the better the language model can assist.

Bu · February 6, 2023, 6:36am

Same for me.
π * (29^2) * 58,7 should be 155,011
and GPT give me 537,295

ruby_coder · February 6, 2023, 6:55am

That is because ChatGPT is NOT a calculator, it’s a pre-trained generative transformer based on a large-language model, @Bu

ChatGPT is like a text-autocompletion engine, not a scientific calculator.

Also, here is what Google replies (155090.062052):

Here is what ChatGPT offers:

Let’s test this method in the console:

irb(main):008:1* def evaluate_expression
irb(main):009:1*   pi = 3.14159265358979323846
irb(main):010:1*   result = pi * (29**2) * 58.7
irb(main):011:1*   return result
irb(main):012:0> end
=> :evaluate_expression
irb(main):013:0> 
irb(main):014:0> puts evaluate_expression
155090.06205197124

So, Google gave the answer as: 155090.062052
ChatGPT provided a method with answer: 155090.062052...
@Bu said the answer is : 155,011.

Who is correct ??

Bu · February 6, 2023, 7:07am

Google right for that math, I’m sorry because I’m testing motor engine capacity so I split 1000 to get that result.

And like you said, GPT is not calculator, so that I used GPT to get the formular instead of make the math.

Just hope GPT can make calculator better in the future.

ruby_coder · February 6, 2023, 7:45am

GPT is a language model, a text-completion engine, like when you type you see autocorrections , but it’s on a difference scale.

GPT does math GREAT if you like GPT is an assistant to help you create methods and formulas based on your skill at defining a problem and breaking the problem set down into smaller, manageable pieces.

I use OpenAI to solve complex math and programming tasks for clients daily, because when I write code, I use these OpenAI models to draft initial methods for me. If I needed a scientific calculator, I would use a calculator versus wasting time trying to engineer a prompt for a chatty language model to parse.

GPT models are “generative AIs”, not expert systems and not general purpose “human like” AIs like you see in sci-fi; However, the “good and the bad” of current generation GPT models are that they write like an super-confident expert (great language skills) but they are only predicting text, not actually having an real-world domain knowledge of what they are doing, haha

So, as you can see, you are human @Bu but you made a mistake and posted the incorrect answer to your math expression confidently as if it was “a fact”. So, don’t be too disappointed in a chatty beta language model which cannot behave as calculator without being prompted to generate methods versus doing math like a desktop calculator.

stuartmitch.home · February 6, 2023, 8:11am

God, the day I dreaded would come has arrived. I’m afraid I’ve more chance of being an effective programmer as I have of being a successful surgeon!! That’s a big dark hole I don’t want to fall into.

I will take your advice on watching your YT vids. Thamks to you and Ruby for pointing me inthe right direction.

Topic		Replies	Views
Chat GPT failing on math reasoning task API	4	2562	February 10, 2024
How to return correct percentiles? Prompting	7	1635	February 10, 2024
On the logical reasoning ability of GPT-4 API gpt-4	28	4733	February 27, 2024
We have a problem with basic maths API	7	2648	December 27, 2023
Response for mathematical request Prompting	9	3310	October 29, 2024

ChatGPT calculations are wrong

Prompt:

ChatGPT Completion

Ruby Console Testing… (where I send a lot of my time testing, BTW):

Second Case:

Related topics