I don’t care what the benchmarks say. I have been testing chatGPT with math questions since day one, and I can confidently say that it is (and has always been) extremely bad at math.

Since ChatGPT was first released, I have been asking tens of math questions every day, and I can’t remember a single occurrence of a correct answer for even the slightest advanced math question. This is made even worse by the fact that it does not follow instructions, no matter how detailed the prompt is. So it’s impossible to walk it through the steps to get to the right answer.

The questions I am asking are graduate level questions that really test the mathematical intelligence of the model. They are not questions like solution to integrals, where the only difference compared to what was (likely) used in training is the naming of the variables. They are questions like particular optimal control problems, or short mathematical derivations.

I understand that these models are not designed (for now) to be good at anything factual, but it’s unbelievable how chatGPT appears to be top of the class on all math benchmarks and then absolutely sucks in practice. I have been doing the same tests with Gemini 1.5 since it was released to the public: it beats chatGPT 99% of the times; when it doesn’t, it’s straightforward to tell it where it made a mistake and how to fix it.

Are you having a similar experience? If not, can you share the type of questions that ChatGPT is getting right?