Response for mathematical request

Hi all, i’m trying around with the openai API in python. Using openai.ChatCompletion.create(model=‘gpt-3.5-turbo-0301’, messages … ) The remaining parameters using there default values.

Sending a request with a mathematical formula: ( german )

was ist 4*3+5/3*2

Got three different answers. None of them is correct , but every answer looks that way.

Das Ergebnis ist 14.333333333 oder 14 1/3.
Das Ergebnis von 4*3+5/3*2 ist 16.33.
Das Ergebnis von 4*3+5/3*2 ist 17.666666666666668.

Are my expectations wrong? :slight_smile:
In any case, OpenAI acts as if it can calculate the formula. It looks like it is actually just guessing :slight_smile:

1 Like

LLM’s are traditionally bad at Maths. The best ways to solve this problem would be to provide a few samples to the LLM while following a method called Chain of Thought. It is essentially you breaking down the solution to the problem step by step for the LLM to understand what is happening.

You have to keep in mind that the LLM is a language model and does not do ‘calculations’.

1 Like

You may try other models which have been specifically designed to handle programming and mathematical tasks. The recommended ones are:

  1. Codex: a language model capable of generating code in various programming languages, including mathematical code.
  2. GPT-Neo (2.7B and 1.3B): powerful language models that can perform various tasks, including mathematical tasks such as solving equations, generating mathematical expressions, and providing explanations of mathematical concepts.

Thanks for the feedback. I haven’t really looked into the features of the models.

But according to that, it would be interesting how the model then actually looks at this mathematical formula. It gives the impression of calculating something.

Before I had already asked the question what
‘’’
4+4/3
‘’’
is. The AI answered here with a detailed description of the mathematical rule for the interpretation of this expression and provided the correct result. Therefore I assumed that the AI can calculate.

How then do the different results come about? Or is it really that the AI is simply guessing here?

I think @udm17 gave an explanation.
Mathematics is a completely different language than the natural human languages the LLMs were exhaustively trained for. The same happens with programming languages.

It is not necessarily guessing, it is more about how the models understand the expression and the user requirements.
They can do some simple calculations alright.
If we use your example with no spaces around the asterisk some models may understand it as a Markdown character format.
If you try your expression in ChatGPT (GPT-3.5) for example, it understands it as multiplication but when it provides the partial solution as this Forum Editor:
4 * 3 ==> 43.
(I can’t even emulate the results correctly, here in the Forum Editor - and there is no AI involved).

Let’s keep in mind that “*” and “^” are not operators of “natural” mathematical language, but programming language operators, while the models are text-oriented. How to describe matrices in a text to them?

These are just examples, and this mathematical issue is not limited to them only.
Another try is to provide the expression in LaTex.
Another one is to help the model with parentheses around partial expressions - so they can keep the order of operations.
In summary, Math is another language.

The best way to solve this is to use a calculator :laughing:

Assume that any calculations provided by GPT is wrong by default

large language models are probabilistic in nature and operate by generating likely outputs based on patterns they have observed in the training data. In the case of mathematical and physical problems, there may be only one correct answer, and the likelihood of generating that answer may be very low.

I’ve written some more about it in the tread over here:

Thank you all,

@N2U … Last but not least, I used paper and pencil to calculate the correct result. :smiley: When I showed the AI result to my friends, they actually agreed with it in the first moment. And this is an interesting point. If you don’t question it, you might build houses with it :smiley: … in the future :smiley:

I think I understood a little more about the basis on which AI makes decisions or formulates answers: i.e. the language model behind it. Was not clear to me directly. I will probably start to deal with it more deeply. :slight_smile:

I didn’t have any expectations in the first attempt, I just wanted to see what would happen. But since it gave a result it was of course funny to see the wrong result. On the second and third try even more so.

I will probably continue to play with the AI. Interesting topic. :smiley:

And sorry for not searching the forum first … :slight_smile:

I am finding that so long as I dont hide the workings out it works way more reliably. But as soon as you dont want to show the workings out it gets it wrong almost always.

You make an astute observation of the present day: suppressing the trained behavior seen in new models, to produce their own chain-of-thought, in reasoning out steps to arrive at an answer, will degrade the quality of the answer.

This is the counterpoint to this thread starting 1.5 years ago, where instead the technique to arrive at a better answer would be spelling out the need to specify to an AI language model that it must follow a procedure of steps, allowing self-observation of a growing context window to increase the quality of a final answer.

Also, expecting an AI model to naturally follow such new behaviors of spelled-out steps when generating a response would be very April 2023, instead of hitting a wall of disobedience and ignoring of such procedural generation steps in new models.

Enhancing new training is the consistent use of and understanding of LaTeX for generation of formulae, the presence of which inspires higher attention to mathematical techniques.

In a bragging rights competition, ability to produce a 0-shot middle school answer is important.

gpt-4o-mini takes on the first question posed (in English):

To solve the expression 4 \times 3 + \frac{5}{3} \times 2, we follow the order of operations (PEMDAS/BODMAS):

  1. Multiplication and Division from left to right:

    • 4 \times 3 = 12
    • \frac{5}{3} \times 2 = \frac{10}{3}
  2. Addition:

    • 12 + \frac{10}{3}

To add 12 and \frac{10}{3}, convert 12 to a fraction:

  • 12 = \frac{36}{3}

Now we can add:

\frac{36}{3} + \frac{10}{3} = \frac{46}{3}

Thus, the final answer is:

\frac{46}{3} \approx 15.33

The only magical leap is employing division in converting to a numeric value answer.
45/3 + 1/3, if you also immediately see the answer.


Try to suppress what is needed and the attempt to tutor you, and ChatGPT will find a tool available - Python code interpreter

image

which is a calculator.

image

If you’re fine with handling calculations on the code side rather than via the model, you could evaluate the expression directly in Python using eval() for reliable results:

expression = “43+5/32”
result = eval(expression)
print(result) # 15.333333333333334