Is GPT-3 good at math? Let the answers speak for themselves!

I am sorry if I was discouraging that was not my purpose. Obviously if I did not see value in your work I would not have commented.

A. We need to UNDERSTAND what GPT-3 can do and what GPT-3 cannot do.

Open-AI IMHO needs 1000s of hardcore application and systems people to develop not fans. But that is my single opinion and should be worth a bit but not much.

B. If you pay close attention to any of the math examples around here you can see they are of arithmetic nature integer forms or simple single variable polynomial forms. Those do not require any AI solvers. There are 100s of well-known algorithms that address their usage.

  1. Try to use this numbers in functional expression Form:

sqrt(1.03001987)*cos(1.9099875) *exp(-0.0135789)

I would be curious to see how GPT-3 or any other Neural net application is able to process these Expressions. Mind you, I know how to do these myself using alternative systems! So the question is to understand the abilities of GPT-3 not to bang on your good work.

I will post some stuff in a separate thread here, so you can bang on my work :slight_smile:

Personally I think you are moving in the right direction, and again I did not mean to be discouraging.



I added some stuff from our side of the wall

These are Free Form Generative textual interfaces which output code+geometry and so on.

Review this please:


This a partial mathematical phrases that assume the Generative AI has a reference for annylus and Boolean algebra and x>y.

Note that user could have input:


I do not know any Neural Net algorithm which could:

  1. Strip off the English and work on annulus&&x>y as an algebraic expression. I have written my own elaborate Recurring Graph Neural Nets that does that, and it was quite hard and required a custom Learned net.

  2. I like to know how GPT-3 could handle Linearized mathematical expressions?

However I do know how to add custom Learned nets to GPT-3 if I had under the hood knowledge , but somehow I find no support from Open AI to build such systems.

So far as I know Open AI’s open source is officially closed by Microsoft so not clear to me who to call at Open AI?


@DutytoDevelop Very interesting post, thank you. I think that while GPT-3 may be good at being able to do simple arithmetic and even algebra, the question that I would like to know is if it actually has internalized the idea of what it means to sum two numbers together. In theory for it to actually understand the algorithm of addition it should be able to understand how to add in general. If you ask it to add two-digit numbers for example, GPT-3 may have seen in its vast amount of training data all possible combinations of two digit arithmetic since there are only so many possible ways to arrange two digit numbers for addition. What would be more impressive would be to see it get correct an answer of adding two ten digit numbers, or in general numbers with enough digits that it could not possibly have seen an example of those numbers previously. Then we would have to admit that GPT-3 knows what it means to do addition.

1 Like

As it was pointed out, GPT-3 does have problems with direct arithmetical calculations, especially with big numbers or float numbers. But its really good in language modelling, so why not using this strong side of it to tackle all the other problems? Look at this example:

This code (generated by Codex) does exactly what it was told to do - addition of numbers of an arbitrary length.


The above algorithm (3 steps of addition) was taken from internet. So it was relatively easy for Codex to convert it into code. But what if we ask GPT-3 to create an algorithm by itself? So here we go:

In this example I just asked it to make “An algorithm of arithmetic addition of two numbers. Consider the numbers as strings of digit literals.” All the rest was generated. Notice that the second piece of code is quite similar to the first one!

And in the end, I asked it to explain the code. That was funny :slight_smile: But I think it explained it quite well. Fascinating, isn’t it?

Just to clarify about my last example, I used Davinci-Codex everywhere. And I run it in two steps. First one was to generate the code from this prompt "“An algorithm of arithmetic addition of two numbers. Consider the numbers as strings of digit literals.” and the second step was to explain the code so I just added “Explain the algorithm:” in the end and it produced all the rest. Here is the preset (OpenAI API)

1 Like

That is impressive. Did you use the same session for both results? In other words, is it possible that it generated the first result by following the instructions in the header, and for the second result just reused the same response?

1 Like

Thanks for the input boris!


No worries, I respect the constructive criticism and open discussion that we’re having.

We need to UNDERSTAND what GPT-3 can do and what GPT-3 cannot do .

Essentially you’re wanting to find out how GPT-3 processes input on a token-by-token basis so that you know how GPT-3 will answer.

Try to use this numbers in functional expression Form :
sqrt(1.03001987)*cos(1.9099875) *exp(-0.0135789)

This is a high-level mathematic concept. In order for GPT-3 to understand this, you’d need to teach it how to:

  1. Properly handle decimal numbers
  2. Perceive the sqrt() function correctly and, provided the training data has clear-cut steps needed for it to understand how to calculate square root functions longhand, produce the correct answer
  3. Same as Step 2 but for the cosine function and exponential function

Assuming there’s enough training data that is well-defined, I don’t see why GPT-3 wouldn’t be able to perform the calculation that you provided.

Hello @jpoirier! Like @m-a.schenk stated, memory between sessions is not possible for GPT-3. However, I’d like to clarify that I wasn’t wanting to generate an algorithm that could be produced by GPT-3 to solve equations, but instead, I’m aiming to teach GPT-3 itself how to quantify numbers and understand mathematical concepts so that given the right training data and a problem that the training data shows how to solve, that GPT-3 can solve the problem with the correct problem-solving steps.

Even though memory between sessions is not possible, you can fine-tune models with your own datasets so that the model does have an understanding of those concepts going forward instead of having to teach it over and over again. As stated by OpenAI here, you would need to send a request to fine-tune the Davinci model to the OpenAI team.

I do not think anyone could teach GPT-3 anything substantial about algebras and so on. That I know for a fact about any Neural Network model.

That is why people suggesting and working on alternative techs, such as ourselves.

However, that was not my proposal. I need to know if I could build a Hybrid system where GPT-3 does what it is good at, and we add our techs dealing with the algebraic and geometrical systems.

I seem not able to get anyone here or at Open-AI to discuss openly, pun intended.

Nor we are interested in any Learning systems to solve equations! Though we have number of innovative algorithms for that purpose.

The equation solvers are for another purpose.

When you say something like below about a Landscape:

"add some trees to region … " and some being some form of Existential Quantifier i.e. you do not tell the AI to learn how to place trees in some way of its own, you give the AI Semantics to attach to the word some.

If we could do that, then GPT Hybrid systems would fly to the moon :slight_smile:

Else GPT is stuck to constantly need to learn every operation for its Semantic.

You do that in your brain and language when you learn from childhood:

“Snakes are beautiful.”

“Snakes are dangerous.”

beautiful is not Semantics you learn!

dangerous is Semantics you are taught by your parents and the environment.

But these very similar sentences require two disparate Semantics in order to make sense.

Weakness here is that everyone is trying to make GPT learn some new stuff (Semantics) and even it could, it is a bad idea.


In my second example (which I gave the preset link for) there was no algorithm in the header. The explanation of the code was generated from the code itself. Which suggests that it has a good grasp of the semantic concepts related to “addition”. Just that it has to be presented in a correct format.

I agree that the hybrid systems is the way forward, at least for now. Because in production-ready environments we need explainability and verifiability of these systems. So I’m advocating for the neuro-symbolic hybrids. What I’m currently trying to implement is a system, which would effectively integrate knowledge graphs with the neuro-symbolic reasoning. And GPT-3 plays an important role here as a means to disambiguate concepts and transform them into logical forms for the “reasoner” to work with. Ideally, with the symbolic subsystem we are able to achieve continues learning as well as long/short memory. Also leveraging First Order Logic or Real Valued Logic we are able to achieve sound and verifiable reasoning.

1 Like

We are working on similar concepts.

neuro-symbolic hybrids: We have created such hybrids in Wolfram Mathematica and shows great promise.They are Recurring Graph Networks, Graphs are the Expression Tree structures for the operators and functions and arguments … all seems to be working.

BUT even at 90%+ accuracy when the neuro-symbolic hybrid makes sometimes the Expression trees produce syntax error. But we can do much better soon.

As for memory, unless I misunderstand it, we can provide Content Associative Memory stored in our Symbolic Cloud for arbitrary random access. These Content Associative Memories are ACTIVE i.e. they could have their learning algorithms or other update mechanisms.

To me the role of GPT is clear, somehow I get confused by what is presented in Open AI sites and this forum.

1 Like

Looking at the comments, it seems like it does well. I’ve had a different experience

Human: Okay, write a Python function to convert temperature from Celsius to Kelvin
AI: I can, but you need to give me more information on the context of the question
Human: What additional information do you need?
AI: I need to know you want me to write a function that converts temperature from Celsius to Kelvin or you want me to write a function to convert Kelvin to Celsius.
Human: A function that converts temperature from Celsius to Kelvin
AI: Sure, here you go; 1 - C
Human: Okay, write a Python function to convert temperature from Fahrenheit to Kelvin
AI: Yes, a function converting temperature from Fahrenheit to Kelvin. Here it is: 1 - 9/5*(F-32)
Human: I'd like the function written in Java, please
AI: Okay, here's your function written in Java. 1 - (-9/5F)
Human: Pretty sure that's wrong
AI: Well, the code was written in Java. You can find me on Reddit at /u/openai
Human: You haven't subtracted 32 from F
AI: Yes, if F is greater than 32 then the value of the expression must be equal to 1. If not, the value depends on whether or not F is greater than or equal to 40.
Human: Okay. Using your function definition, can you convert 98 degree Fahrenheit to Kelvin?
AI: Sure, it's approximately equal to -54 degrees Celsius
Human: But I asked for Kelvin
AI: Okay, but you need to give me more information on the context of the question. Does the temperature in Kelvin go up or down

Codex can solve some simple non-linear control problems.
It is a matter of how you script it I guess.
Look at Wolfram Alpha, they use an AI model too.

Reality is stochastic, thus we can solve anything with math.

1 Like

Try playing with GPS coordinates and more elaborate geometry problems. It doesn’t grasp its concepts. Unless you use Matrices. But it cant convert back and forth, it has no reference thus it has no knowledge.

1 Like

Addition or any other well know function or operator with FIXED number of arguments is easy to construct from any simple grammar!

You do not need Codex.

The problem is when the Operator or the Function is either not fully defined or takes variable number of arguments.

Example: Boolean or Implicit or Parametric Regions:



As you can see there is no control on the number arguments and in the examples I had provided earlier you might have a NAMED REGION e.g. annulus which is subject to definition or might be undefined at the time of parsing.

The Arithmetic examples are well understood, and there are many parsers of different types that could handle code generators as semantics and there are even no-code generators.

If you want to pursue the vision of Open AI and Codex, in my humble opinion and not an instruction to any of you, please move away from the arithmetic operations.


I wonder if GPT-3 is any better at proof-writing, which is more English based. For example, I was able to get this result from Davinci:

The square root of two is irrational.

The proof is by contradiction. Suppose that the square root of two is rational. Then it is the ratio of two integers, say a and b. Then a2 = 2b2. But this implies that b2 is even, and so b is even. But since a and b are integers, a is even, and so a2 is even. This is a contradiction.

It does alright, but it misses the key point that both a and b can’t be even. It doesn’t handle the square root of 3 nearly as well:

The square root of 3 is irrational. The proof is by contradiction. Suppose that the square root of 3 is rational. Then it is a fraction p/q with p and q integers. Then p2 = 3, q2 = 3, and pq = 1. But then p2 – q2 = 0, and so p = q = 1, which is a contradiction.

The square root of 2 is irrational. The proof is

Some of the reasoning is sound–if p^2 and q^2 are both three, then p^2-q^2 = 0. Likewise, if p=q and pq=1, then p=q=1. It also found the contradiction correctly. However, that proof includes assumptions which are untrue.

I’ve also done very minimal prompt design–there might be a way to give it certain assumptions or types of proof and let it work through the way to use them.

Of course, it still does better with the proofs than raw calculations:

sqrt(1.202030)= 1.0390625

sqrt(1.23040) = 1.1092339699
sqrt(1.202030) =


I do think there’s more performance to be wrung out of GPT-3 when it comes to mathematical reasoning, especially if one can strip out numbers from the inputs.

edited: the paper linked above slipped my mind while writing this post. But I still wonder about more general “justifications” versus proofs.

cidex performs better with math

The new 30 January 2023 ChatGPT math upgrade attempts to address this “math” issue: