“ChatGPT can make mistakes”, says the text at the bottom.
That is because it is a language prediction AI - writing chats and reproducing from trained language and knowledge in a convincing manner.
Let’s ask ChatGPT a relevant question about this, and judge whether it makes mistakes in answering also, or whether I think it produced a meaningful answer…
Question
How come ChatGPT cannot do simple math correctly (adding a bunch of numbers)?
How can a computer get addition wrong??
Answer in terms of the nature of how LLM AI works, how it produces responses.
ChatGPT gpt-4o-mini response
Great question! To understand why ChatGPT (and other large language models) might make errors in simple arithmetic like addition, it’s important to understand how these models work and why they sometimes struggle with tasks like math.
How ChatGPT Works
ChatGPT is a type of large language model (LLM) that has been trained on vast amounts of text data. It doesn’t “think” in the same way a human or a specialized calculator does. Instead, it generates responses based on patterns in language, drawing from its training data, which includes a wide variety of written text—books, articles, websites, and other forms of written human expression.
When you ask ChatGPT to do math, it doesn’t actually perform calculations like a traditional computer or a calculator. Instead, it generates answers based on patterns it has learned from the text, where similar mathematical problems have been discussed.
For example, if it has seen many instances where addition problems are solved correctly in its training data, it can likely produce the right answers. However, if it encounters an unfamiliar or ambiguous problem, or if it hasn’t seen enough examples of a particular kind of math question, it may not get the right answer.
Why Can It Get Math Wrong?
-
Lack of True Calculation Ability:
ChatGPT is not a computational tool like a calculator, which is specifically designed to perform mathematical operations. It does not have a built-in mechanism for arithmetic computation. Instead, it predicts the next most likely word or phrase in a sequence based on patterns it learned during training. So, if it’s asked to solve a problem like “12 + 17,” it may give a plausible-sounding answer, but it isn’t using an actual “internal calculator” to verify the sum.
-
Reliance on Statistical Patterns:
When ChatGPT responds to math problems, it’s drawing from past instances in the text data it was trained on. In other words, it tries to “guess” the correct answer based on how similar problems have been answered in the data. This can lead to mistakes, especially for more complex or less commonly seen problems.
-
Context and Ambiguity:
In natural language processing (NLP), context plays a huge role. If the math problem is framed within a more complex context (e.g., as part of a conversation), the model may get distracted by the surrounding words and focus more on generating a grammatically correct response than accurately solving the math problem. Ambiguity in the phrasing of a question might also lead to errors.
-
Error Propagation in Longer Calculations:
When adding or performing other operations over a long series of numbers, small errors in earlier predictions can compound. If a wrong step is taken early on, the model might continue with incorrect logic, resulting in an ultimately wrong answer.
Example:
If you ask ChatGPT “What’s 234 + 567 + 89?”, the model may try to find patterns from similar problems, but it’s not actually adding those numbers together in a mathematical sense. It might recall or predict a similar type of response from the training data, but the final result could be flawed.
Conclusion:
In short, ChatGPT doesn’t “do math” the way traditional calculators or computation-based models do. It generates text based on learned patterns, which is why it can produce errors, particularly with tasks that require precise, deterministic calculations. For arithmetic, a specialized tool or algorithm that explicitly performs calculations would be more reliable.
Seems a pretty good answer, accelerating my typing work here. The only other aspect to discuss is that the AI also has to think in tokens, which for numbers, have their own token set from 000-999. Your math problem is encoded as:

where each color is a new token. So the AI doesn’t even see individual digits - it sees a token number sequence as input that looks like:
3, 702, 11, 1302, 659, 548, 702, 11, 1302, 659, 548, 1130, 11, 1302, 659, 548, 1125, 11, 46254, 13, 1910, 659, 548, 1055, 11, 1302, 659, 548, 22, 11, 1302, 659, 548, 24, 11, 1302, 659, 548, 2092, 11, 6718, 659, 548, 21, 11, 1302, 659, 548, 455, 11, 1302, 659, 548, 19, 11, 1302, 659, 548, 21, 11, 1302
Pretty amazing that something even similar to an answer can be predicted from a language model, even if it’s hard to predict the exact start token of an answer that really needs calculation.
On the API, we can just give the AI an “addition calculator tool” to send numbers to. In ChatGPT, it can send the numbers within Python code for addition.