Today, I have been testing o1 and o1 pro mode’s performance in the Japanese Language Proficiency Test (JLPT) at N1 level (C1 level CEFR), particularly in Question 6 (Sentence Composition)
There are 5 questions with four blanks each and one has to choose the correct expression that goes in the third blank. It is one of the trickier questions in the test because one has to build the sentences mentally and has to make sure they make sense both logically and grammatically. They are like small linguistic puzzles.
Here are the results I got for 5 trials:
- o1 failed / o1 pro mode correct
- o1 correct / o1 pro mode correct
- o1 failed / o1 pro mode correct
- o1 correct / o1 pro mode correct
- o1 correct / o1 pro mode correct
o1 got 3 out of 5 while o1 pro mode got all answers correct. I think this is another indicator of pro mode’s superiority over “normal” o1.
An interesting thing I noted is that thought time (in pro mode) showed a large variance, from 4 seconds to more than 3 minutes.
As an additional note, the 4o model fails lamentably at this question almost every time (expect in rare cases).
Just posting here in case it sparkles any interest.