I don’t care what the benchmarks say. I have been testing chatGPT with math questions since day one, and I can confidently say that it is (and has always been) extremely bad at math.
Since ChatGPT was first released, I have been asking tens of math questions every day, and I can’t remember a single occurrence of a correct answer for even the slightest advanced math question. This is made even worse by the fact that it does not follow instructions, no matter how detailed the prompt is. So it’s impossible to walk it through the steps to get to the right answer.
The questions I am asking are graduate level questions that really test the mathematical intelligence of the model. They are not questions like solution to integrals, where the only difference compared to what was (likely) used in training is the naming of the variables. They are questions like particular optimal control problems, or short mathematical derivations.
I understand that these models are not designed (for now) to be good at anything factual, but it’s unbelievable how chatGPT appears to be top of the class on all math benchmarks and then absolutely sucks in practice. I have been doing the same tests with Gemini 1.5 since it was released to the public: it beats chatGPT 99% of the times; when it doesn’t, it’s straightforward to tell it where it made a mistake and how to fix it.
Are you having a similar experience? If not, can you share the type of questions that ChatGPT is getting right?
i just tried to give it some bond math stuff for a grad course i’m doing, and yes, it sucks. it lays out all the calculation steps pretty well, but then will make very basic math errors (eg, it just told me .75 * 2 = 3, which threw off the whole calculation).
That’s because humans have never defined math well, I get what you might be thinking reading this, but even an AI chatbot like ChatGPT finds it difficult to understand notations/expressions, due to the way humans have defined math. So until it uses a “calculator” plugin or some sort of “fact-checking” method, we probably won’t stop seeing the issue of inaccuracy.
It’s a large LANGUAGE model, not a calculator. You can ask it to provide you calculation steps and do them yourself, or to explain the formulas and the way to use them, etc., but it’s not designed to solve complex equations, as it can be read in thousands of places. If you wanna do math with GPT’s help, ask it to define the variables and give you a formula, then type it into a calculator to get the result.
i partially disagree. gpt is still bad in understanding problems and convert them to formulas too. i tried and even in basic school problems it fails many times in understanding what has to be calculated and how.
i believe openai just didn’t trained gpt models with enough math problems.
even that wolfram plugin didn’t help if gpt fails to use it.
i can confirm this as a cryptologist. I specialize in gematria and it is incompetent. It knows what Sumerian gematria is and that Jesus equals 888 in Greek isosephy but when I asked it to give me what it found to personally be the most notable gematria phenomenon it pumped out a horrible factually wrong list of not just garbage numbers with wrong calculations it aligned opposite subjects as if that was what I meant by “matching”. Sure, hate and love match as emotions but it was clearly making category errors like that by seeking top level categories to group subjects rather than specific alignments like love and peace (not that they align numerically I simply am giving a kind of example). It is so dumb in other words it sees good and evil as a subject match and worse, just makes up a number. It seems whoever programmed OpenAi made the priorities to be speed and CHATTING, as in mimicking chatting, and accuracy simply being a bonus. It is good to know if whoever programmed Gemini corrected this. But then why haven’t the ChatGPT pgrammers? Are they lazy, grifting by pretending to be making improvements? Maybe they are woke morons and this is there way of getting revenge on those who won’t accept a Wokebot, a propaganda robot for these New Age sex addicted moron narcissists? What other explanation is there? Same with Twitter, despite Elon claiming to have cleaned house, there were still Woke moderators or programmers. I kept getting flagged to no end. The programmers are clearly the problem, or you could say it’s whoever is in charge of them, perhaps it’s a guy who thinks he’s in control and wise because of his wealth, and can’t imagine the programmers he is paying are playing him, or them. ChatGPT is good with translation, mostly. But again messes up with giving it math instructions with language instructions. For example I asked it repeatedly to tell me stroke counts for Chinese characters and it kept giving wrong answers, or that I couldn’t verify. When I asked Bing, it refused, because I asked to see what would align with six strokes combining to three repeating sixes, regardless of whether or not it was a name, and I never mentioned names. Bing refused and repeated the same I can’t help you with that answer. These AIs are shockingly stupid, Woke, and communistic.