ChatGPT calculations are wrong

($1,452,190 x 2.6 x 5/12) / 100 = $9,957

I have noticed that the free chat AI does not perform well with simply math. I think that the focus is on the language rather than the mathā€¦

1 Like

Yes, because ChatGPT (at this maturity point in itā€™s lifecycle) needs more context.

Try this type of prompt instead:

Write a Ruby program to perform this calculation ($1,452,190 x 2.6 x 5/12) / 100.

The results I get are correct (or very close depending on the desired precision):

irb(main):001:0> result = (1_452_190 * 2.6 * 5/12) / 100
irb(main):002:0> puts "Result is: $#{result}"
Result is: $15732.058333333332

As mentioned earlier, this depends on the context. See these 'not yet perfect` examplesā€¦ just some ideas. The Ruby instructions were more specific, and so the answer was more accurate, I think :slight_smile: For some reason, these answers are not ā€œperfectā€ like my Ruby reply from ChatGPT:

ChatGPT is an immature beta generative AI. It often needs more context and so itā€™s up to the user to guide the AI on how to solve problems such as this.

See Also:

Again, my earlier Ruby code example were more specific, and so the answer was more accurate, I think :slight_smile:

It seems to be that ChatGPT needs more context. In these examples, ChatGPT did not have enough context. However, in my Ruby prompt, ChatGPT nailed it. I guess this is because I specifically asked ChatGPT to generate code in a programming language ChatCPT supports.

I gave OAI what I thought was a simple enough instruction, which it appeared to understand.

William: Please calculate the value of the word ā€œComputerā€ using the code A = 1, B= 2ā€¦all the way to Z = 26 using the English alphabet

Open AI: To calculate the value of the word ā€œComputerā€ using the code you provided, we can add up the values of each letter as follows:
C = 3 O = 15 M = 13 P = 16 U = 21 T = 20 E = 5 R = 18
The total value of the word ā€œComputerā€ using this code is: 3 + 15 + 13 + 16 + 21 + 20 + 5 + 18 = 105

I informed the AI that the answer was incorrect, and that the right answer was 111.

It then redid the calculation in the same ponderous manner, this time giving the correct answer.

Being surprised that a computer program would have difficulty in doing simple math I wrote the following:

William: I want to discuss your making a mathematical mistake as I find it interesting that a computer program would be made to be able to do that.

After that chat, I gave OAI another word to calculate.

William: Please calculate the value of ā€œOpen heartedā€ using the same algorithm A=1ā€¦

Again OAI produced an incorrect answer; Again I gave it the correct answer, and again it recalculated and returned the correct answer.

I then asked it;

William: Are you just copying my answer?

AI apologized "if my previous responses gave the impression that I was simply copying your answers. "

I then asked it to ā€œPlease Calculate ā€œCalculationā€ using the same codeā€

It repeated its process and once more came up with an incorrect answer.

I told it that the answer was incorrect and said the correct answer was 112, and once more it when through the calculation process and this time delivered the answer 112.

It was then that I realized the program was simply copying the correct answer it had been given, and mirroring that data back to me, because the correct answer was 111 not 112.

After a bit more discussion about this;

William: It is not difficult to do these simple calculations, do you agree?

Open AI: Calculating the values of words and phrases using a simple letter-value code like the one you provided is a straightforward task that does not require a lot of complex processing or analysis. As an artificial intelligence, I am able to perform such calculations quickly and accurately.

Clearly the reply was misinformation. :slight_smile:

Open AI: Is there anything else I can help with?

William: You could help me if you could learnā€¦but you apparently cannot learnā€¦

Open AI: As an artificial intelligence, I am designed to process and analyze information and provide responses based on that processing and analysis. I am able to learn and adapt to new information and experiences within the scope of my programming and the data that has been provided to me.

Is there something specific youā€™d like to know or discuss about artificial intelligence or about the process of learning and adapting to new information?

I had nothing more to say. It appears to be true that one can learn from and teach OAI, but I saw no point in going down that path re the math, as I use an online word-value calculator anyway.

Working with words and generating messages is more my focus, which appears to be this AIā€™s particular strength.

1 Like

Please see my post here on improving calculations:

2 Likes

Last night I used ChatGPT to solve some very simple maths calculations and it failed miserably. I had a data set of 20 numbers which I asked ChatGPT to sum and to find the mean, mode, medium etc for. It got the sum wrong and then all the others as a result. The others were not just incorrect due to the sum error but, based on the incorrect sum, got them wrong again.

Donā€™t use ChatGPT for maths. Does anyone know of an alternative AI app that can calculate maths?

Hi @stuartmitch.home

It is probably best if you post back with the exact prompts you used.

It mades a big difference when asking a GPT-based large language model, which operates like a type of text autocompletion engine, to perform math in how the prompts are worded.

HTH

1 Like

I have to second @ruby_coderā€™s opinion. LLMsā€™ completion outputs can vary largely based on input, even with temp=0

That being said, there exist AI interfaces for the Wolfram languange which could solve your problem

1 Like

Yes, of course (from wikipedia):

The Wolfram Language is a proprietary, general high-level multi-paradigm programming language developed by Wolfram Research. It emphasizes symbolic computation, functional programming, and rule-based programming and can employ arbitrary structures and data.

So, of course it can do math :slight_smile:

It is also a functional programming / rules-based / symbolic programming system and not a GPT text-autocompletion / prediction engine.

As @Dent implies, but does not mention directly, is that there are better AI tools (which are expert system and rule-based) which do math very well. It is important to understand what a GPT based model is before asking it to expect results which are not part of the design criteria ( see footnote 1).

ChatGPT is a text-prediction engine and is not an expert system and nor is it ā€œrule-basedā€ except some of the proprietary tweaking and moderation which we donā€™t have a concise list of, to my knowledge.

Footnote:

  1. There is no shortage of people attempting to use GPT models to perform tasks which GPT models are not designed to do as optimally as non GPT based models. This reminds us (every day here in this community) of the old saying goes ā€œWhen you have a hammer in your hand, everything looks like a nail !ā€ ChatGPT is no exception to this adage.

  2. The above situation is normal in the life-cycle of a new technology, especially during the peak of the ā€œpublic hype phaseā€. Over time, developers will get ā€œback to normalā€ and will realize where GPT-models are optimal and when they are suboptimal.

1 Like

Iā€™m gonna put that adage a step further: From the lips of Adam Savage, who got it from someone else: ā€œIn every tool, there is a hammer.ā€ So, like, even if you donā€™t have a hammer in your hands, you can probably make one out of what you do have, and that isnā€™t necessarily the greatest idea

1 Like

Hahahaā€¦ well said and very timely. Thanks @Dent

Yeah, we are seeing a lot of that everywhere, including this lively community for sure.

:slight_smile:

Fair point. I should have.

One very basic calculation was ā€œcalculate the sum of the following numbers: 200 192 187 187 196 190 188 198 204 191 175 198 188 189 186 198 189 187 180 179ā€ The answer given was ā€˜3,038ā€™ when it should be ā€˜3.802ā€™. I also asked for the mode of these numbers and got 2 (187). When there are two numbers each with 3 instances.

Sounds like you need some programming exerience! There are many free programming courses available online, but myself and others on the forum have intro guides/courses/etc. especially geared toward LLM-use

Yes, as @Dent kindly mentioned, you will get better results if you ā€œthinkā€ like a software developer.

For example:

Prompt:

Write a ruby method to calculate the sum of a string of numbers, given the string of numbers separated by white-space. Output the final sum.

ChatGPT Completion

def sum_numbers(string_of_numbers)
    numbers = string_of_numbers.split(' ').map(&:to_i)
    sum = 0

    numbers.each do |num|
        sum += num
    end

    puts sum
end

Ruby Console Testingā€¦ (where I send a lot of my time testing, BTW):

irb(main):027:1* def sum_numbers(string_of_numbers)
irb(main):028:1* numbers = string_of_numbers.split(' ').map(&:to_i)
irb(main):029:1* sum = 0
irb(main):030:1* 
irb(main):031:2* numbers.each do |num|
irb(main):032:2* sum += num
irb(main):033:1* end
irb(main):034:1* 
irb(main):035:1* puts sum
irb(main):036:0> end
=> :sum_numbers
irb(main):037:0> sum_numbers("200 192 187 187 196 190 188 198 204 191 175 198 18
8 189 186 198 189 187 180 179")
3802
=> nil
irb(main):038:0> 

That method is a ā€œgood startā€ but it does not account for all variations of ā€œwhite spaceā€ between numbers, and it just prints and does not return the sum, but itā€™s a start :+1: Itā€™s easy to either reword the prompt, but faster to just make the changes and move on to the next task at hand.

Second Case:

For your second case, @stuartmitch.home maybe you can try again, changing your prompt using more ā€œprogrammer geek-speakā€ ? :slight_smile:

HTH

Note: What many of us have learned about GPT models, is that they make fine assistants but the more domain knowledge a user has, the more valuable / helpful the assistance. As @Dent is trying to say, the more domain knowledge you have (in this case programming / software development) the better the language model can assist.

Same for me.
Ļ€ * (29^2) * 58,7 should be 155,011
and GPT give me 537,295
:hot_face:

That is because ChatGPT is NOT a calculator, itā€™s a pre-trained generative transformer based on a large-language model, @Bu

ChatGPT is like a text-autocompletion engine, not a scientific calculator.

Also, here is what Google replies (155090.062052):

Here is what ChatGPT offers:

Letā€™s test this method in the console:

irb(main):008:1* def evaluate_expression
irb(main):009:1*   pi = 3.14159265358979323846
irb(main):010:1*   result = pi * (29**2) * 58.7
irb(main):011:1*   return result
irb(main):012:0> end
=> :evaluate_expression
irb(main):013:0> 
irb(main):014:0> puts evaluate_expression
155090.06205197124

So, Google gave the answer as: 155090.062052
ChatGPT provided a method with answer: 155090.062052...
@Bu said the answer is : 155,011.

Who is correct ??

Google right for that math, Iā€™m sorry because Iā€™m testing motor engine capacity so I split 1000 to get that result.

And like you said, GPT is not calculator, so that I used GPT to get the formular instead of make the math.

Just hope GPT can make calculator better in the future.

GPT is a language model, a text-completion engine, like when you type you see autocorrections , but itā€™s on a difference scale.

GPT does math GREAT if you like GPT is an assistant to help you create methods and formulas based on your skill at defining a problem and breaking the problem set down into smaller, manageable pieces.

I use OpenAI to solve complex math and programming tasks for clients daily, because when I write code, I use these OpenAI models to draft initial methods for me. If I needed a scientific calculator, I would use a calculator versus wasting time trying to engineer a prompt for a chatty language model to parse.

GPT models are ā€œgenerative AIsā€, not expert systems and not general purpose ā€œhuman likeā€ AIs like you see in sci-fi; However, the ā€œgood and the badā€ of current generation GPT models are that they write like an super-confident expert (great language skills) but they are only predicting text, not actually having an real-world domain knowledge of what they are doing, haha

So, as you can see, you are human @Bu but you made a mistake and posted the incorrect answer to your math expression confidently as if it was ā€œa factā€. So, donā€™t be too disappointed in a chatty beta language model which cannot behave as calculator without being prompted to generate methods versus doing math like a desktop calculator.

:slight_smile:

God, the day I dreaded would come has arrived. Iā€™m afraid Iā€™ve more chance of being an effective programmer as I have of being a successful surgeon!! Thatā€™s a big dark hole I donā€™t want to fall into.

I will take your advice on watching your YT vids. Thamks to you and Ruby for pointing me inthe right direction.