Is GPT-3 good at math? Let the answers speak for themselves!

Looking at the comments, it seems like it does well. I’ve had a different experience

Human: Okay, write a Python function to convert temperature from Celsius to Kelvin
AI: I can, but you need to give me more information on the context of the question
Human: What additional information do you need?
AI: I need to know you want me to write a function that converts temperature from Celsius to Kelvin or you want me to write a function to convert Kelvin to Celsius.
Human: A function that converts temperature from Celsius to Kelvin
AI: Sure, here you go; 1 - C
Human: Okay, write a Python function to convert temperature from Fahrenheit to Kelvin
AI: Yes, a function converting temperature from Fahrenheit to Kelvin. Here it is: 1 - 9/5*(F-32)
Human: I'd like the function written in Java, please
AI: Okay, here's your function written in Java. 1 - (-9/5F)
Human: Pretty sure that's wrong
AI: Well, the code was written in Java. You can find me on Reddit at /u/openai
Human: You haven't subtracted 32 from F
AI: Yes, if F is greater than 32 then the value of the expression must be equal to 1. If not, the value depends on whether or not F is greater than or equal to 40.
Human: Okay. Using your function definition, can you convert 98 degree Fahrenheit to Kelvin?
AI: Sure, it's approximately equal to -54 degrees Celsius
Human: But I asked for Kelvin
AI: Okay, but you need to give me more information on the context of the question. Does the temperature in Kelvin go up or down

Codex can solve some simple non-linear control problems.
It is a matter of how you script it I guess.
Look at Wolfram Alpha, they use an AI model too.

Reality is stochastic, thus we can solve anything with math.

1 Like

Try playing with GPS coordinates and more elaborate geometry problems. It doesn’t grasp its concepts. Unless you use Matrices. But it cant convert back and forth, it has no reference thus it has no knowledge.

1 Like

Addition or any other well know function or operator with FIXED number of arguments is easy to construct from any simple grammar!

You do not need Codex.

The problem is when the Operator or the Function is either not fully defined or takes variable number of arguments.

Example: Boolean or Implicit or Parametric Regions:

Boolean
x>y&&x^2+y^2<1

x>y&&x^2+y^2<1||(x-0.5)^2+y^2<=1

As you can see there is no control on the number arguments and in the examples I had provided earlier you might have a NAMED REGION e.g. annulus which is subject to definition or might be undefined at the time of parsing.

The Arithmetic examples are well understood, and there are many parsers of different types that could handle code generators as semantics and there are even no-code generators.

If you want to pursue the vision of Open AI and Codex, in my humble opinion and not an instruction to any of you, please move away from the arithmetic operations.

Dara

I wonder if GPT-3 is any better at proof-writing, which is more English based. For example, I was able to get this result from Davinci:

The square root of two is irrational.

The proof is by contradiction. Suppose that the square root of two is rational. Then it is the ratio of two integers, say a and b. Then a2 = 2b2. But this implies that b2 is even, and so b is even. But since a and b are integers, a is even, and so a2 is even. This is a contradiction.

It does alright, but it misses the key point that both a and b can’t be even. It doesn’t handle the square root of 3 nearly as well:

The square root of 3 is irrational. The proof is by contradiction. Suppose that the square root of 3 is rational. Then it is a fraction p/q with p and q integers. Then p2 = 3, q2 = 3, and pq = 1. But then p2 – q2 = 0, and so p = q = 1, which is a contradiction.

The square root of 2 is irrational. The proof is

Some of the reasoning is sound–if p^2 and q^2 are both three, then p^2-q^2 = 0. Likewise, if p=q and pq=1, then p=q=1. It also found the contradiction correctly. However, that proof includes assumptions which are untrue.

I’ve also done very minimal prompt design–there might be a way to give it certain assumptions or types of proof and let it work through the way to use them.

Of course, it still does better with the proofs than raw calculations:

sqrt(1.202030)= 1.0390625

sqrt(1.23040) = 1.1092339699
sqrt(1.202030) =

1.2025482786

I do think there’s more performance to be wrung out of GPT-3 when it comes to mathematical reasoning, especially if one can strip out numbers from the inputs.

edited: the paper linked above slipped my mind while writing this post. But I still wonder about more general “justifications” versus proofs.

cidex performs better with math

The new 30 January 2023 ChatGPT math upgrade attempts to address this “math” issue: