Seeking Advice: Enhancing Accuracy of GPT-4 with Vision

HenriqueMelo · November 16, 2023, 4:31pm

Hello everyone,

I’ve been using the OpenAI API for some basic testing until now, and I’m planning to integrate it into a project of mine. Here’s the situation: I have a large set of multiple-choice questions, each accompanied by an image, the question itself, and possible answers.

My initial idea was to use the ‘gpt-4-vision-preview’ to analyze the image and explain why a certain answer, let’s say ‘X’, is correct. But then I thought of a different approach. Instead of giving it the correct answer, I could let the model identify the correct answer itself and then provide an explanation. This method, I believe, could add more credibility to its explanations since the model would be independently identifying the correct answer.

In my initial tests yesterday, the model performed flawlessly, not missing a single question. However, in today’s tests, I noticed some errors. The model was confusing certain questions due to similar terms used in the answers. In these cases, the specific terminology is crucial, even though the terms might generally refer to the same thing.

To tackle this, I’m thinking of using ‘gpt-4-vision-preview’ to describe the images in detail and then fine-tuning a model with every guideline from a comprehensive document I have. This might lead to more accurate results.

Since I’m relatively new to the OpenAI API, I’m not entirely sure if this is the best solution. Does anyone have any suggestions or know of any articles that might help?

Foxalabs · November 16, 2023, 4:49pm

Hi and welcome to the Developer Forum!

This is all new! The best thing to do is try it and see what results you get.

The reason Ai is generating such a buzz is this is more like the discovery of electricity than it is a progression of computing, it’s all new, everything is up for grabs and there are no dusty old text books full of best practice yet.

Experiment and let us know how you get on.

HenriqueMelo · November 16, 2023, 5:00pm

Thanks for the feedback!

I plan to continue experimenting and will use this thread to share my results.

pzdzxlx · December 14, 2023, 6:27am

Hello，friend. i want to know what kind of picture you want the gpt process? for different kinds of picture it shows different ablity. For example, it can not count the number of anything but can recognize a location just through some building. do you have any ways to solve this problem?

HenriqueMelo · December 15, 2023, 11:07am

Hi @pzdzxlx, I required assistance in analysing images and providing accurate responses to associated questions. The process is akin to a quiz, where each question is accompanied by an image, the question itself, and multiple possible answers. To enhance accuracy, I’ve addressed the model’s limitations in specific terminology by constructing a comprehensive vector database encompassing all relevant knowledge. I then embed the top query result from the vector database into the prompt, thereby achieving more precise and reliable outcomes.

skarjigi98 · May 15, 2024, 7:32am

Hi,
I did try a different approach for my use case.
I had to derive data from image and I used to get lot of hit - misses in output.

Reading the doc, I came to know that vision breaks images into 512 * 512 format.

So if your input image is greater than 512 by 512, it will convert it to 512 by 512 (also noticed if image was originally high detailed, the accuracy was good) but problem arises if image is below 512 by 512, in this case, it will stretch the image to match the desired format.

I converted all my images to 512 by 512 by default in combination with OCR enhancements set to true and details set to High in payload and voila, most of my images got correct outputs.

Also preprocessing the image , adding grayscale, sharpening it etc might help.

Topic		Replies	Views
Make OpenAI Vision API Match GPT4 Vision API chatgpt	4	3916	December 6, 2023
Question about GPT-4 Vision API and Limits in Image Analysis API api	4	370	January 27, 2025
OpenAI API OCR isn't as successful as chatGPT API gpt-4 , api , ocr	10	778	May 13, 2025
Better Understand Images / Train On Annotated Images API gpt-4 , api	22	1716	April 2, 2024
Can GPT -vision models be accessed using API? API	15	1562	January 7, 2025

Seeking Advice: Enhancing Accuracy of GPT-4 with Vision

Related topics