Compare 15 SAT Math Question between GPT-4-turbo and GPT-4-vision-preview

Here’s my personal attempt to compare the newest model using 15 SAT problem from The 15 Hardest SAT Math Questions Ever

I’m using the vision capability with individual screenshot of the problem and ask to transcribe screenshot then answer the question.

Result:

Question No GPT-4-turbo GPT-4-vision-preview
1 Pass Pass
2 Pass Pass
3 Pass Pass
4 Pass Pass
5 Pass Pass
6 Pass Fail
7 Pass Pass
8 Pass Pass
9 Pass Pass
10 Pass Pass
11 Pass Fail
12 Pass Pass
13 Fail Fail
14 Pass Pass
15 Pass Pass

Thats roughly from 80% to 93% and the inference speed is significantly faster.

Edit: edited the old gpt vision model name

3 Likes

As gpt-4-turbo-preview is not a vision model, the alternative you should be comparing is gpt-4-1106-vision-preview, a new alias for gpt-4-vision-preview.
Maybe you just have a typo in your report?

1 Like

Ah my bad, indeed im using gpt-4-vision-preview, thank you.

1 Like

so you’re using gpt-4-turbo with function calling and using vision to transcribe the question from the photo and then having turbo solve the problem?

gpt-4-turbo can see.

    flowchart
         image --> gpt-4-turbo
1 Like

Yeah my mistake I understand what he’s done now.

It is vanilla test without external function calling / tool