My experience: ChatGPT really sucks at math

I don’t care what the benchmarks say. I have been testing chatGPT with math questions since day one, and I can confidently say that it is (and has always been) extremely bad at math.

Since ChatGPT was first released, I have been asking tens of math questions every day, and I can’t remember a single occurrence of a correct answer for even the slightest advanced math question. This is made even worse by the fact that it does not follow instructions, no matter how detailed the prompt is. So it’s impossible to walk it through the steps to get to the right answer.

The questions I am asking are graduate level questions that really test the mathematical intelligence of the model. They are not questions like solution to integrals, where the only difference compared to what was (likely) used in training is the naming of the variables. They are questions like particular optimal control problems, or short mathematical derivations.

I understand that these models are not designed (for now) to be good at anything factual, but it’s unbelievable how chatGPT appears to be top of the class on all math benchmarks and then absolutely sucks in practice. I have been doing the same tests with Gemini 1.5 since it was released to the public: it beats chatGPT 99% of the times; when it doesn’t, it’s straightforward to tell it where it made a mistake and how to fix it.

Are you having a similar experience? If not, can you share the type of questions that ChatGPT is getting right?

1 Like

i just tried to give it some bond math stuff for a grad course i’m doing, and yes, it sucks. it lays out all the calculation steps pretty well, but then will make very basic math errors (eg, it just told me .75 * 2 = 3, which threw off the whole calculation).

That’s because humans have never defined math well, I get what you might be thinking reading this, but even an AI chatbot like ChatGPT finds it difficult to understand notations/expressions, due to the way humans have defined math. So until it uses a “calculator” plugin or some sort of “fact-checking” method, we probably won’t stop seeing the issue of inaccuracy.