Much press has been devoted to discussing GPT-3’s limitations in doing arithmetic. Some ascribe the problem to how numbers are tokenized. I found it interesting to ask it to do arithmetic as if one “read out loud” the numbers:
What is the sum of 4 thousand 3 hundred 9 tens and 6 with 5 thousand 9 hundred 4 tens and 7?
The sum of 4 thousand 3 hundred 9 tens and 6 with 5 thousand 9 hundred 4 tens and 7 is 10 thousand 3 hundred 13 tens and 13. Should be 10343
And completely using words: What is the sum of four thousand three hundred nine tens and six with five thousand nine hundred four tens and seven?
The sum of four thousand three hundred nine tens and six with five thousand nine hundred four tens and seven is ten thousand three hundred thirteen tens and thirteen.
What is the product of 4 thousand 3 hundred 9 tens and 6 with 5 thousand 9 hundred 4 tens and 7?
The product is 26,183,895. 4396x5947 = 26143012 – pretty close!
What is the difference between 4 thousand 3 hundred 9 tens and 6 with 5 thousand 9 hundred 4 tens and 7?
The difference between 4 thousand 3 hundred 9 tens and 6 with 5 thousand 9 hundred 4 tens and 7 is that the first number has 4 in the thousands place, 3 in the hundreds place, 9 in the tens place, and 6 in the ones place, while the second number has 5 in the thousands place, 9 in the hundreds place, 4 in the tens place, and 7 in the ones place.
Subtract 4 thousand 3 hundred 9 tens and 6 from 5 thousand 9 hundred 4 tens and 7
5 thousand 9 hundred 4 tens and 7
-4 thousand 3 hundred 9 tens and 6
5 hundred 94 tens and 1 Should be 1551
What is the sum of 3 million 7 hundred thousand 4 thousand 3 hundred 9 tens and 6 with 4 million and 6 ten thousand 5 thousand 9 hundred 4 tens and 7?
The sum of the two numbers is 7 million 7 hundred thousand 9 thousand 4 tens and 3. 3704396+4065947 = 7770343
Language do not has implicit knowledge of math. The matrix generated by the Query and Key create the relation between words and we dont have enough relation for it to do math at a higher certainty.
However we we train it with code and fine tune it. It performs like a charm.
Other studies showed that we can do multi stream Transformers combining different domains of knowledge.
Naturally some researchers already proven it like you can see at MathBERT
Also as I mentioned, a more refined version of MathBERT fine-tuned on coding:
It is indeed interesting GPT-3 raw ability to do arithmetic. It could make a great publication for a data scientist.
Note, most articles that claim GPT-3 (and similar large language models) are incapable of arithmetic are wrong; they do not understand the fundamental loss function and how that relates to prompts.
GPT-3, and similar autoregressive LMs, are not oracles or agents; they act as simulators that can instantiate virtual oracles, agents, or beings that are specified by the prompt. The prompt is critical to allow LLMs to reach their full potential.
Most articles testing GPT-3’s arithmetic capabilities use zero-shot inference, which only demonstrates the authors lack of understanding. Even with a natural language description of a task, there is still a huge domain of outputs that could be classified as “correct” for the given question. Thus, to get GPT-3 to simulate an arithmetic oracle, either few-shot inference or fine-tuning is required.
I actually dove into a research study that was performed regarding GPT-3’s ability to do math, and analyzed why the setup and execution of the study itself was rather flawed which led to poor results. If you’d like to read my analysis, please go here: Is GPT-3 good at math? Let the answers speak for themselves!
Thanks @DutytoDevelop - interesting post and discussion. It inspired me to try a couple zero-shot experiments asking DaVinci to explain how to do multiplication. Pretty good and interesting but sloppy like many elementary school students.