Compare 15 SAT Math Question between GPT-4-turbo and GPT-4-vision-preview

Dojima · April 10, 2024, 2:43pm

Here’s my personal attempt to compare the newest model using 15 SAT problem from The 15 Hardest SAT Math Questions Ever

I’m using the vision capability with individual screenshot of the problem and ask to transcribe screenshot then answer the question.

Result:

Question No	GPT-4-turbo	GPT-4-vision-preview
1	Pass	Pass
2	Pass	Pass
3	Pass	Pass
4	Pass	Pass
5	Pass	Pass
6	Pass	Fail
7	Pass	Pass
8	Pass	Pass
9	Pass	Pass
10	Pass	Pass
11	Pass	Fail
12	Pass	Pass
13	Fail	Fail
14	Pass	Pass
15	Pass	Pass

Thats roughly from 80% to 93% and the inference speed is significantly faster.

Edit: edited the old gpt vision model name

_j · April 10, 2024, 2:52pm

As gpt-4-turbo-preview is not a vision model, the alternative you should be comparing is gpt-4-1106-vision-preview, a new alias for gpt-4-vision-preview.
Maybe you just have a typo in your report?

Dojima · April 10, 2024, 2:59pm

Ah my bad, indeed im using gpt-4-vision-preview, thank you.

cdunn · April 10, 2024, 8:52pm

so you’re using gpt-4-turbo with function calling and using vision to transcribe the question from the photo and then having turbo solve the problem?

_j · April 10, 2024, 9:08pm

gpt-4-turbo can see.

    flowchart
         image --> gpt-4-turbo

cdunn · April 10, 2024, 9:27pm

Yeah my mistake I understand what he’s done now.

Dojima · April 11, 2024, 2:32am

It is vanilla test without external function calling / tool

Topic		Replies	Views
The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis API gpt-4 , chatgpt , gpt-4-vision , gpt4-vision , api-vision	5	1131	July 27, 2024
How do I know if my fine-tuned model is actually better than the base model? (For MATH-related use cases) API plugin-development , playground	0	475	April 17, 2024
Can GPT -vision models be accessed using API? API	15	1604	January 7, 2025
Is gpt-4-1106-vision-preview also getting lazy Prompting gpt-4 , gpt-4-vision , gpt4-vision	6	2600	February 5, 2024
Can GPT accomplish this task? If not, please let me know to save my time API gpt-4	6	664	June 30, 2024

Compare 15 SAT Math Question between GPT-4-turbo and GPT-4-vision-preview

Related topics