Hi all, i’m trying around with the openai API in python. Using openai.ChatCompletion.create(model=‘gpt-3.5-turbo-0301’, messages … ) The remaining parameters using there default values.

Sending a request with a mathematical formula: ( german )

was ist 4*3+5/3*2

Got three different answers. None of them is correct , but every answer looks that way.

Das Ergebnis ist 14.333333333 oder 14 1/3.

Das Ergebnis von 4*3+5/3*2 ist 16.33.

Das Ergebnis von 4*3+5/3*2 ist 17.666666666666668.

Are my expectations wrong?
In any case, OpenAI acts as if it can calculate the formula. It looks like it is actually just guessing

LLM’s are traditionally bad at Maths. The best ways to solve this problem would be to provide a few samples to the LLM while following a method called Chain of Thought. It is essentially you breaking down the solution to the problem step by step for the LLM to understand what is happening.

You have to keep in mind that the LLM is a language model and does not do ‘calculations’.

You may try other models which have been specifically designed to handle programming and mathematical tasks. The recommended ones are:

Codex: a language model capable of generating code in various programming languages, including mathematical code.

GPT-Neo (2.7B and 1.3B): powerful language models that can perform various tasks, including mathematical tasks such as solving equations, generating mathematical expressions, and providing explanations of mathematical concepts.

Thanks for the feedback. I haven’t really looked into the features of the models.

But according to that, it would be interesting how the model then actually looks at this mathematical formula. It gives the impression of calculating something.

Before I had already asked the question what
‘’’
4+4/3
‘’’
is. The AI answered here with a detailed description of the mathematical rule for the interpretation of this expression and provided the correct result. Therefore I assumed that the AI can calculate.

How then do the different results come about? Or is it really that the AI is simply guessing here?

I think @udm17 gave an explanation.
Mathematics is a completely different language than the natural human languages the LLMs were exhaustively trained for. The same happens with programming languages.

It is not necessarily guessing, it is more about how the models understand the expression and the user requirements.
They can do some simple calculations alright.
If we use your example with no spaces around the asterisk some models may understand it as a Markdown character format.
If you try your expression in ChatGPT (GPT-3.5) for example, it understands it as multiplication but when it provides the partial solution as this Forum Editor: 4 * 3 ==> 43.
(I can’t even emulate the results correctly, here in the Forum Editor - and there is no AI involved).

Let’s keep in mind that “*” and “^” are not operators of “natural” mathematical language, but programming language operators, while the models are text-oriented. How to describe matrices in a text to them?

These are just examples, and this mathematical issue is not limited to them only.
Another try is to provide the expression in LaTex.
Another one is to help the model with parentheses around partial expressions - so they can keep the order of operations.
In summary, Math is another language.

Assume that any calculations provided by GPT is wrong by default

large language models are probabilistic in nature and operate by generating likely outputs based on patterns they have observed in the training data. In the case of mathematical and physical problems, there may be only one correct answer, and the likelihood of generating that answer may be very low.

I’ve written some more about it in the tread over here:

@N2U … Last but not least, I used paper and pencil to calculate the correct result. When I showed the AI result to my friends, they actually agreed with it in the first moment. And this is an interesting point. If you don’t question it, you might build houses with it … in the future

I think I understood a little more about the basis on which AI makes decisions or formulates answers: i.e. the language model behind it. Was not clear to me directly. I will probably start to deal with it more deeply.

I didn’t have any expectations in the first attempt, I just wanted to see what would happen. But since it gave a result it was of course funny to see the wrong result. On the second and third try even more so.

I will probably continue to play with the AI. Interesting topic.