GPt 4o mini is dummber than you can think

seungoh.jung · July 19, 2024, 1:22am

i’ve tested out the legendary ‘is 9.11 bigger than 9.9’.

and somehow, gpt 4o mini is dumber, and got stubborn to it.

seungoh.jung · July 19, 2024, 1:24am

tested more, and it’s stubborn like a stone ngl.

dragger.anirudh28 · July 19, 2024, 11:35am

well its fixed

anitaa · July 19, 2024, 12:13pm

Here are some results from our latest evals:

Data Extraction: GPT-4o Mini performs worse than GPT-3.5 Turbo and Claude 3 Haiku, sometimes missing the mark entirely. All models don’t have high enough quality for this task (only 60-70% accuracy)
Classification: Highest precision for GPT-4o (88.89%), making it the best choice to avoid False Positives. Balanced F1 Score between GPT-4o Mini & GPT-3.5 Turbo
Verbal Reasoning: GPT-4o Mini outperforms the other models. It doesn’t do well on numerical questions but performs well on relationship / language specific ones.

More info here: GPT-4o Mini vs Claude 3 Haiku vs GPT-3.5 Turbo

trebuchet · July 20, 2024, 12:28pm

How do yall even have access to GPT4 mini ??? I don’t even see any option to select it !!!

grandell1234 · July 20, 2024, 12:46pm

If you upgrade to Plus you get a selector with a bunch of different models.

trebuchet · July 20, 2024, 12:51pm

But I thought GPT4o mini is said to be free to access for all ?

grandell1234 · July 20, 2024, 12:54pm

No, you can use a limited GPT-4o version though.

roshankumar2004 · July 20, 2024, 1:41pm

gpt gaslighting you is funny

sungm2n · July 21, 2024, 3:40pm

Instead of asking “is 9.11 greater than 9.9” if you input “9.11 > 9.9” then 4o fails as well. Claude and Gemini pass.

4o = “Yes, 9.11 is greater than 9.9.”
4o-mini = “Yes, 9.11 is greater than 9.9. The comparison is straightforward because 9.11 is numerically higher than 9.9.”
Gemini = “No, 9.11 is not greater than 9.9. In fact, 9.11 is less than 9.9.”
Claude = “Since 1 < 9 in the tenths place, we can immediately conclude that 9.11 is less than 9.9. Therefore, the statement 9.11 > 9.9 is false.”

pol.marti · July 23, 2024, 3:31pm

Only available for developments using the API

lhhar · August 7, 2024, 8:06am

4o was surprisingly bad at the “9.11 > 9.9” question. I had to tell it to use python to make it change its mind.

4o-mini did get it wrong in the first place but it’s reasoning was flawed. I mistook 9.9 as 9.090, and was quick to correct its answer when I pointed it out.

dignity_for_all · August 7, 2024, 8:39am

For example, GPT-4o and GPT-4o-mini seem to make mistakes when comparing numbers where the decimal part is 0.11, such as X.11, with numbers like X.9.
This is a relatively niche error, and they don’t seem to make mistakes with other number comparisons.
There may be many other examples of errors if you look for them.

Once they make a mistake, they will continue to make errors when calculating the difference between the numbers. If you ask them to set up an equation and subtract the same number from both sides to demonstrate the properties of the equation, it gradually becomes incoherent.

As the term “niche error” implies, they seem to be able to correctly compare differences in cases like X.22 and X.8.

Basically, since LLMs (Large Language Models) are designed to predict the next token and generate natural sentences, it is better not to expect too much from their arithmetic or mathematical abilities.

Nevertheless, it would be better if they could avoid mistakes in basic number comparisons.

jozef.the.grey · August 12, 2024, 7:08am

Bruh, even GPT4omni “fat version” does this, still…

" is 9.11 greater than 9.9

ChatGPT

Yes, 9.11 is greater than 9.9.

4o

explain how exactly

ChatGPT

To compare 9.11 and 9.9, look at the digits in each place value:

Tenths place: Both numbers have a “9,” so they are equal at this level.
Hundredths place: 9.11 has a “1” in the hundredths place, while 9.9 effectively has a “0” in the hundredths place (since 9.9 is the same as 9.90).

Since 1 is greater than 0, 9.11 is greater than 9.9."

Facepalm

Topic		Replies	Views
Why chatgpt 10-mini is worse than gpt 4o even 4o is free Community chatgpt	0	175	January 3, 2025
My experience: ChatGPT really sucks at math Community gpt-4 , chatgpt	8	6122	January 26, 2025
Open AI gpt-4o mini low iq Community gpt-4 , chatgpt	12	633	January 17, 2025
GPT-4o vs. gpt-4-turbo-2024-04-09, gpt-4o loses API gpt-4	38	14573	June 11, 2024
Is GPT-4 Mini a Clone of GPT-3? API chatgpt , gpt-4o-mini	6	400	July 31, 2024

GPt 4o mini is dummber than you can think

Related topics