Is GPT-3 good at math? Let the answers speak for themselves!

I think that maybe more robust approach would be to use the GPT-3 or Codex language modelling power to generalize and translate natural language concepts into algorithms. Then run these algorithms to make exact calculations. Something like this should work:

2 Likes

All the examples are SMALL INTEGERS in particular positive integers.

Once you move to Reals or large integers none of the known Neural Nets perform worthy of any notice!

The rest of the examples are arithmetic based including the arxiv paper which again quite simple and you might need need any AI to handle those at all!

To keep hyping these simple examples over and over only does injustice to the Open-AI techs! Although I have no commitment to Open-Al and vice-a-versa, I just feel these hyped up simple examples are going to repel sophisticated experts in the field away.

I will post examples tonight, which was planning to do some for some weeks, so you understand what is in demand.

I see from my experience where Open-AI fits in math and algebra and geometry, but we need a better forum to focus energy with sophisticated applications.

Obviously these statements are not well received but someone needs to say them .

Dara

Hello @dara,

First of all, thank you for your input! I’d like to hear your reasoning since this is exactly why I created this post. I see now that I should have gone into more detail regarding a few key points I made to address the counter-arguments you gave me. As a side-note, I don’t believe this post does any injustice to the OpenAI crew. This post is meant to be a learning experience built upon and discussed by whoever wishes to provide their insights and perspective regarding GPT-3 and what GPT-3 is capable of.

:stop_sign: Clarification:

Since I didn’t communicate this as clearly as I would’ve liked, allow me to further explain why I used the training data I did in order to conclude why I believe GPT-3 is capable of understanding math to any degree. My analysis aimed to conclude that GPT-3 has the core function necessary to understand a math problem and solve it with correct understanding. I designed the practice problems to see if GPT-3 was able to learn one area of mathematics. Once the language model demonstrates that, then theoretically what’s to stop it from learning more mathematical concepts?

During my analysis, I did have limitations but I did my best to work with what I had. Please review the limitations I faced when performing my analysis, as this may help understand why the training data wasn’t the best, but still valuable I believe:

  1. I saw that GPT-3 had a tough time understanding numbers, however just like natural language, there is a pattern in the way numbers are expressed (word, numerical, or any other form). I’m sure that with the right training data, GPT-3 can accurately and confidently quantify the numbers expressed to it. I did look into how I could teach GPT-3 to quantify numbers properly, however, I found that I didn’t have as good of an understanding of how GPT-3 best understands input given to it that I do now and found myself re-constructing my training data over and over. On top of that, I realized that I would need way more training data as well. I am sure there’s a way to tackle it, but I do have other projects I’d like to work on with GPT-3 and I’d like to explore other topics to be as productive as possible.

  2. Going forward, let’s assume that we successfully trained GPT-3 on how to properly quantify any number given to it. We then need to see if GPT-3 can pick up and understand basic arithmetic and algebra. We do see that GPT-3 was able to learn how to solve the problem. However, since I didn’t train GPT-3 to quantify numbers confidently, as @boris pointed out, the language model did have a harder time solving the equations since it wasn’t confident with the numbers we were giving to it. The screenshots that provided the Completion with the full spectrum probability graph show that the least confident parts of the input data were the numbers. Now, I’m unsure how much of this was due to the language model learning a new change in the pattern with each sample problem, but knowing that GPT-3 does have trouble quantifying numbers, I’m sure that affected the confidence level.

:dna: My Main Point:

We see that when GPT-3 fails to understand how big numbers are supposed to be quantified but we still introduce a higher-level concept like arithmetic prematurely, we automatically inherit the problems that the lower-level concept had. Sure, you can feed it more training data to account for the loss, but I suspect it’d take more training data to accurately train the model.

Given well-defined training data, i.e. a series of math problems, GPT-3 can understand the structure of the problem such that it correctly identifies how every part of the problem will affect the final answer. With that, any similar problem presented to GPT-3 will be understood already, since the model has trained enough to know how every part interacts with each other to produce the correct answer!

By ensuring that lower-level concepts are properly understood by the model, we prevent further errors when training GPT-3 with higher-level concepts. Yes, I am aware that I could’ve trained the model better but that would’ve taken more time and resources to perfect the model and I was trying to show y’all with my data how I could still get the main point across. Since we were able to get the model to briefly demonstrate a higher-level concept such as basic algebra so as long as we didn’t use big numbers, it shows that if we had perfected the lower-level concept that we would’ve been able to see the model do basic algebra with much higher numbers more confidently.

And also I’m not saying that there’s only one way of training GPT-3, I’m sure there are several other ways that would allow for the model to learn a concept! The key point is that as long as the model in learning correctly, then GPT-3 should be able to continuously build upon previous concepts to learn more advanced topics. My examples were meant to conceptualize how GPT-3 could be taught in order to achieve an understanding of higher-level mathematics correctly.

:bookmark: Summary:

To sum up what I’m saying, here is the rundown of why I believe GPT-3 can be good at math:

  1. If we look past the poor confidence when I didn’t teach GPT-3 to properly quantify numbers, we see that GPT-3 was pretty good with answering basic algebra questions with the numbers it could quantify. I’m sure had we successfully taught GPT-3 to quantify numbers first, it would be much easier for us to teach higher-level mathematical concepts.

  2. With the right training data, I’m sure we can teach GPT-3 how to quantify numbers correctly so that we can allow it to better learn more advanced topics.

  3. If it can learn one mathematical concept, why can’t it learn another one? Theoretically, we should be able to build upon multiple concepts given that we have trained them correctly.

:question: Questions I Now Have:

  1. What are some things that I did or didn’t do that may have hindered its learning process when answering the math questions I gave it? Please exclude the example of teaching GPT-3 to quantify numbers correctly.

  2. Since I only fed GPT-3 the question “If x is equivalent to [first number], what is [second number] plus x?”, I found that giving GPT-3 the same question but worded differently slightly threw off GPT-3’s overall confidence when answering the question. When training GPT-3 to get good at a concept, would feeding training data with the same problems simply worded differently help the language model separate the concept from the structure of the question itself?

  3. In addition to the projects I have planned, are there any topics that you’d like to see explored?

  4. Are there any topics that you found hard or almost impossible to teach GPT-3 that you’d like to share?

:robot: Please feel free to share your thoughts and ideas, fellow community members!

It’s interesting to understand the current downsides. With these types of problems I’d expect sufficient amount of data and fine-tuning to be able to solve the problem efficiently at some point, as demonstrated on a more difficult mathematical problem of theorem prooving in the publication I shared above [2009.03393] Generative Language Modeling for Automated Theorem Proving

1 Like

I am sorry if I was discouraging that was not my purpose. Obviously if I did not see value in your work I would not have commented.

A. We need to UNDERSTAND what GPT-3 can do and what GPT-3 cannot do.

Open-AI IMHO needs 1000s of hardcore application and systems people to develop not fans. But that is my single opinion and should be worth a bit but not much.

B. If you pay close attention to any of the math examples around here you can see they are of arithmetic nature integer forms or simple single variable polynomial forms. Those do not require any AI solvers. There are 100s of well-known algorithms that address their usage.

  1. Try to use this numbers in functional expression Form:

sqrt(1.03001987)*cos(1.9099875) *exp(-0.0135789)

I would be curious to see how GPT-3 or any other Neural net application is able to process these Expressions. Mind you, I know how to do these myself using alternative systems! So the question is to understand the abilities of GPT-3 not to bang on your good work.

I will post some stuff in a separate thread here, so you can bang on my work :slight_smile:

Personally I think you are moving in the right direction, and again I did not mean to be discouraging.

Dara

4 Likes

I added some stuff from our side of the wall

These are Free Form Generative textual interfaces which output code+geometry and so on.

Review this please:

annulus&&x>y

This a partial mathematical phrases that assume the Generative AI has a reference for annylus and Boolean algebra and x>y.

Note that user could have input:

-1.00238765*x>sqrt(3.098779)*y

I do not know any Neural Net algorithm which could:

  1. Strip off the English and work on annulus&&x>y as an algebraic expression. I have written my own elaborate Recurring Graph Neural Nets that does that, and it was quite hard and required a custom Learned net.

  2. I like to know how GPT-3 could handle Linearized mathematical expressions?

However I do know how to add custom Learned nets to GPT-3 if I had under the hood knowledge , but somehow I find no support from Open AI to build such systems.

So far as I know Open AI’s open source is officially closed by Microsoft so not clear to me who to call at Open AI?

Dara

@DutytoDevelop Very interesting post, thank you. I think that while GPT-3 may be good at being able to do simple arithmetic and even algebra, the question that I would like to know is if it actually has internalized the idea of what it means to sum two numbers together. In theory for it to actually understand the algorithm of addition it should be able to understand how to add in general. If you ask it to add two-digit numbers for example, GPT-3 may have seen in its vast amount of training data all possible combinations of two digit arithmetic since there are only so many possible ways to arrange two digit numbers for addition. What would be more impressive would be to see it get correct an answer of adding two ten digit numbers, or in general numbers with enough digits that it could not possibly have seen an example of those numbers previously. Then we would have to admit that GPT-3 knows what it means to do addition.

1 Like

As it was pointed out, GPT-3 does have problems with direct arithmetical calculations, especially with big numbers or float numbers. But its really good in language modelling, so why not using this strong side of it to tackle all the other problems? Look at this example:

This code (generated by Codex) does exactly what it was told to do - addition of numbers of an arbitrary length.

2 Likes

The above algorithm (3 steps of addition) was taken from internet. So it was relatively easy for Codex to convert it into code. But what if we ask GPT-3 to create an algorithm by itself? So here we go:

In this example I just asked it to make “An algorithm of arithmetic addition of two numbers. Consider the numbers as strings of digit literals.” All the rest was generated. Notice that the second piece of code is quite similar to the first one!

And in the end, I asked it to explain the code. That was funny :slight_smile: But I think it explained it quite well. Fascinating, isn’t it?

Just to clarify about my last example, I used Davinci-Codex everywhere. And I run it in two steps. First one was to generate the code from this prompt "“An algorithm of arithmetic addition of two numbers. Consider the numbers as strings of digit literals.” and the second step was to explain the code so I just added “Explain the algorithm:” in the end and it produced all the rest. Here is the preset (OpenAI API)

1 Like

That is impressive. Did you use the same session for both results? In other words, is it possible that it generated the first result by following the instructions in the header, and for the second result just reused the same response?

1 Like

Thanks for the input boris!

@dara,

No worries, I respect the constructive criticism and open discussion that we’re having.

We need to UNDERSTAND what GPT-3 can do and what GPT-3 cannot do .

Essentially you’re wanting to find out how GPT-3 processes input on a token-by-token basis so that you know how GPT-3 will answer.

Try to use this numbers in functional expression Form :
sqrt(1.03001987)*cos(1.9099875) *exp(-0.0135789)

This is a high-level mathematic concept. In order for GPT-3 to understand this, you’d need to teach it how to:

  1. Properly handle decimal numbers
  2. Perceive the sqrt() function correctly and, provided the training data has clear-cut steps needed for it to understand how to calculate square root functions longhand, produce the correct answer
  3. Same as Step 2 but for the cosine function and exponential function

Assuming there’s enough training data that is well-defined, I don’t see why GPT-3 wouldn’t be able to perform the calculation that you provided.

Hello @jpoirier! Like @m-a.schenk stated, memory between sessions is not possible for GPT-3. However, I’d like to clarify that I wasn’t wanting to generate an algorithm that could be produced by GPT-3 to solve equations, but instead, I’m aiming to teach GPT-3 itself how to quantify numbers and understand mathematical concepts so that given the right training data and a problem that the training data shows how to solve, that GPT-3 can solve the problem with the correct problem-solving steps.

Even though memory between sessions is not possible, you can fine-tune models with your own datasets so that the model does have an understanding of those concepts going forward instead of having to teach it over and over again. As stated by OpenAI here, you would need to send a request to fine-tune the Davinci model to the OpenAI team.

I do not think anyone could teach GPT-3 anything substantial about algebras and so on. That I know for a fact about any Neural Network model.

That is why people suggesting and working on alternative techs, such as ourselves.

However, that was not my proposal. I need to know if I could build a Hybrid system where GPT-3 does what it is good at, and we add our techs dealing with the algebraic and geometrical systems.

I seem not able to get anyone here or at Open-AI to discuss openly, pun intended.

Nor we are interested in any Learning systems to solve equations! Though we have number of innovative algorithms for that purpose.

The equation solvers are for another purpose.

When you say something like below about a Landscape:

"add some trees to region … " and some being some form of Existential Quantifier i.e. you do not tell the AI to learn how to place trees in some way of its own, you give the AI Semantics to attach to the word some.

If we could do that, then GPT Hybrid systems would fly to the moon :slight_smile:

Else GPT is stuck to constantly need to learn every operation for its Semantic.

You do that in your brain and language when you learn from childhood:

“Snakes are beautiful.”

“Snakes are dangerous.”

beautiful is not Semantics you learn!

dangerous is Semantics you are taught by your parents and the environment.

But these very similar sentences require two disparate Semantics in order to make sense.

Weakness here is that everyone is trying to make GPT learn some new stuff (Semantics) and even it could, it is a bad idea.

Dara

In my second example (which I gave the preset link for) there was no algorithm in the header. The explanation of the code was generated from the code itself. Which suggests that it has a good grasp of the semantic concepts related to “addition”. Just that it has to be presented in a correct format.

I agree that the hybrid systems is the way forward, at least for now. Because in production-ready environments we need explainability and verifiability of these systems. So I’m advocating for the neuro-symbolic hybrids. What I’m currently trying to implement is a system, which would effectively integrate knowledge graphs with the neuro-symbolic reasoning. And GPT-3 plays an important role here as a means to disambiguate concepts and transform them into logical forms for the “reasoner” to work with. Ideally, with the symbolic subsystem we are able to achieve continues learning as well as long/short memory. Also leveraging First Order Logic or Real Valued Logic we are able to achieve sound and verifiable reasoning.

1 Like

We are working on similar concepts.

neuro-symbolic hybrids: We have created such hybrids in Wolfram Mathematica and shows great promise.They are Recurring Graph Networks, Graphs are the Expression Tree structures for the operators and functions and arguments … all seems to be working.

BUT even at 90%+ accuracy when the neuro-symbolic hybrid makes sometimes the Expression trees produce syntax error. But we can do much better soon.

As for memory, unless I misunderstand it, we can provide Content Associative Memory stored in our Symbolic Cloud for arbitrary random access. These Content Associative Memories are ACTIVE i.e. they could have their learning algorithms or other update mechanisms.

To me the role of GPT is clear, somehow I get confused by what is presented in Open AI sites and this forum.
D

1 Like

Looking at the comments, it seems like it does well. I’ve had a different experience

Human: Okay, write a Python function to convert temperature from Celsius to Kelvin
AI: I can, but you need to give me more information on the context of the question
Human: What additional information do you need?
AI: I need to know you want me to write a function that converts temperature from Celsius to Kelvin or you want me to write a function to convert Kelvin to Celsius.
Human: A function that converts temperature from Celsius to Kelvin
AI: Sure, here you go; 1 - C
Human: Okay, write a Python function to convert temperature from Fahrenheit to Kelvin
AI: Yes, a function converting temperature from Fahrenheit to Kelvin. Here it is: 1 - 9/5*(F-32)
Human: I'd like the function written in Java, please
AI: Okay, here's your function written in Java. 1 - (-9/5F)
Human: Pretty sure that's wrong
AI: Well, the code was written in Java. You can find me on Reddit at /u/openai
Human: You haven't subtracted 32 from F
AI: Yes, if F is greater than 32 then the value of the expression must be equal to 1. If not, the value depends on whether or not F is greater than or equal to 40.
Human: Okay. Using your function definition, can you convert 98 degree Fahrenheit to Kelvin?
AI: Sure, it's approximately equal to -54 degrees Celsius
Human: But I asked for Kelvin
AI: Okay, but you need to give me more information on the context of the question. Does the temperature in Kelvin go up or down

Codex can solve some simple non-linear control problems.
It is a matter of how you script it I guess.
Look at Wolfram Alpha, they use an AI model too.

Reality is stochastic, thus we can solve anything with math.

1 Like

Try playing with GPS coordinates and more elaborate geometry problems. It doesn’t grasp its concepts. Unless you use Matrices. But it cant convert back and forth, it has no reference thus it has no knowledge.

1 Like

Addition or any other well know function or operator with FIXED number of arguments is easy to construct from any simple grammar!

You do not need Codex.

The problem is when the Operator or the Function is either not fully defined or takes variable number of arguments.

Example: Boolean or Implicit or Parametric Regions:

Boolean
x>y&&x^2+y^2<1

x>y&&x^2+y^2<1||(x-0.5)^2+y^2<=1

As you can see there is no control on the number arguments and in the examples I had provided earlier you might have a NAMED REGION e.g. annulus which is subject to definition or might be undefined at the time of parsing.

The Arithmetic examples are well understood, and there are many parsers of different types that could handle code generators as semantics and there are even no-code generators.

If you want to pursue the vision of Open AI and Codex, in my humble opinion and not an instruction to any of you, please move away from the arithmetic operations.

Dara