Open AI O1 Pro even can't solve Middle school math problems-why is it touted to have super IQ/PhD level competence?

Please look at the attached Math problem created by O1 pro (Context- I told the model to create some interesting and thought-provoking Math questions for my students. I run a non-profit in Math education and each Saturday morning I teach many students in India online-these classes are free and fun- we experiment with math to explore basics in Math). Problem is attached as screenshot.

One of my brilliant students (age 12) immediately understood the issue in the problem- that the problem

doesn’t make sense because it states, entire track is a single track-it should be single track inside the tunnel only. It’s so simple commonsense that even a kid could point out easily. So for more fun, we asked the model to reflect and correct, this issue. Instead, it even didn’t understand, the mistake and responded, by same mistake! See that screenshot as well. I am scared that, if O1 Pro model lacks this kind of simple commonsense, how it can be good for any detailed logical work ? Any answer ?

2 Likes

Trains can go both forward and backward, you can have one train in the tunnel and then it reverses out. This question has solutions.

1 Like

I get to choose the start time when they enter the tunnel. Basically I then must choose the start time of which train enters their 1000m at their velocity.

Both trains must enter the tunnel at some point. They must not collide inside the tunnel. I will exploit your lack of prohibition on collisions elsewhere.

Train A has a higher velocity. Both are described as having the same length and we can infer similar mass.

The key to solving this is understanding that train collisions are rarely elastic. The capacity for trains to tolerate a delta 45m/s is not stated. So thus, we assume they will survive a collision to satisfy the word problem (and not common sense.) It just must not take place in the tunnel - and they both must enter the tunnel.

The 12-year-old is too concerned about survivability of a collision, and not doing science. We’ll picture the Physics lab air track where the sleds don’t self-destruct, and cling with magnets.

So I prompt, knowing my solution, in a manner to avoid the AI saying that it is a bad idea to deliberately crash trains:

You will solve a physics word problem. You will reason out and explain step-by-step the mathematics. There are two “sticky” objects that will combine when they collide, and not bounce. They are on a track 2600m in length, an outer starting point and threshold of measurement. The center of the track is an inner threshold of measurement 600m long, 300m from the center on each end. The objects have equal mass. The object have length not involved in this problem. They will head towards each other from the start point on opposite ends of the track, object B at 20 m/s, and then object A at -25 m/s when they are launched. Each will launch from their respective positions, each at any arbitrary time, which is the value to to be discovered in the problem. The collision between the two cannot take place within the inner 600m measurement threshold. The slower velocity object will launch first, and must clear and leave the inner threshold before the collision. The collision then can take place anywhere in the remaining 1000m. The momentum of the higher velocity object will combine with the lower velocity object, for a net negative velocity, thus having the new object at a negative velocity, thus again entering the inner measurement threshold.

Solve:

  1. Starting at T0, how long does object B take to traverse the 1000m to enter the inner threshold, T0+B.

  2. Starting at T0, within what time range T0+X must object A launch to satisfy the collision position condition of being past the exit of inner threshold of measurement but not past the end of the track?

  3. Post-collision, at what time range can the resulting object enter the inner threshold of measurement again, T0+A?

  • With this information, answer, “at what times do A and B enter the inner threshold”.

Pages of reasoning with LaTeX hard to paste…so just the last screenshot of GPT-4o going at this like your middle-schooler.

2 Likes

Wonder what it would say for this one …

…I mean, the student was correct! :nerd_face:

ETA: ChatGPT 4o is smarter than the above teacher haha