Better Understand Images / Train On Annotated Images

Hi, I’m experimenting with the new API and creating a project that uses GPT-4 vision to assess damages done to the surface of an object. So far, it’s not bad, however, it seems to suffer with certain images where the damages are really hard to see. Is there any way I could improve this by giving it images that have the sections with damage highlighted?

1 Like

Hi and welcome to the Developer Forum!

Sounds like it might benefit from some edge detection pre processing, or someway to highlight minor blemishes, could just be a lack of contrast that a simple opencv edge detect pass would enhance, maybe some contrast enhancement… I’d experiment with the simple to implement standard image processing methods out there and perhaps make a few calls to the API with various levels of enhancement.

2 Likes

different type of lighting if you have control over where you get your images can produce a lot more detail. multispectral photography. I use to use it with analysis for crop damage etc.

If you put a marker on the damage, it will get it to focus better. That has been my limited experience.

Here is the theory on why this works with GPT-4V

https://community.openai.com/t/paper-set-of-mark-prompting-unleashes-extraordinary-visual-grounding-in-gpt-4v/442261?u=curt.kennedy

3 Likes



Take a look at these, I have a script that updates the contrast, and the damage is somewhat visible, however whatever I do, it can never see the damage.

I would use stickers but that will take too much time for inspectors.

Looks new to me. Is it that small scuffing on the front bumper?

Otherwise, perfect car! :100:

PS If a “median human” can’t see it, then neither can the AI, without a lot of coaching/guiding/handholding. So markers are obviously going to be needed here IMO.

1 Like

Perhaps it’s just me, I’ve been looking at the cars for a while so I can see it quickly, but the black mark just above the wheel is the damage.

The whole idea is to use the images we have of each side of the car and then return a report of the damage. I personally think the mark is easy to see but that’s just me. I guess I could try to add some sort of markers

1 Like

Yeah I see it :laughing:

It’s pretty hard to spot if you don’t have an exact idea of how its supposed to look.

Is there an option to tell the inspectors to just take pictures that show the damage up close?

They can’t really take the time to mark it or anything and the app is designed to force them to take pictures in exact positions.

Maybe take the photo, cut it into smaller tiles, and have the AI examine each tile for damage? Without getting into markers, this might be a way to move forward.

PS Also goes without saying, work on your prompting. I had a non-working GPT-4V initially, but had to lay out all sorts of parameters and expectations in the prompt for it to work.

2 Likes

Yeah I agree with Curt, that’s what I think you should try :laughing:

Are you the guy making the app?

1 Like

Indeed I am. I really like Curts idea, i’ll give it a shot!

2 Likes

Awesome!

I know a few car inspectors, and have a thing for classic vehicle’s myself :laughing:

What both they and I want is an app that allows me to do a walk around the car while talking about the damage and press a button to make a vehicle report :wink:

1 Like

So I’m doing exactly that. I mean exactly what you described. But for Cox.

1 Like

I think another thing to keep in mind is that GPT-4V will slice your image into 512x512 areas.

For your photo GPT-4V will look at a down-scaled but full image (low res @ 512x512), and then identify the finer details by looking at the sliced high-res images.

high will enable “high res” mode, which first allows the model to see the low res image and then creates detailed crops of input images as 512px squares based on the input image size. Each of the detailed crops uses twice the token budget (65 tokens) for a total of 129 tokens.

So I believe the best value would be found by adjusting contrast for different types of damage, and then also slicing it strategically (somehow) and possibly zooming on certain areas. AKA Don’t let GPT slice and scale the image for you.

Now I’m interested to see how well GPT can identify lines running underneath car. Almost a nightmare for a human to do… but maybe?

2 Likes

great idea btw if this is what you are using it for. explosion in my mind of the potential for all sorts of application.

1 Like

I’ll update the thread after I do some testing see if I can have a poc setup

1 Like

I find this incredibly intriguing, I work at one of the largest collision repair groups in the UK and I’ve been testing out the capabilities of GPT-4V myself.

So far I’ve had some successful small tests with it successfully identifying the complexity of the damage and what type of repair location it should go to (we operate on a hub-spoke model, with the Spokes only doing non-structural repairs).

I was obviously ecstatic when they announced the Vision API! Now just need to figure out how to actually make something with it…

I work in a body shop and write estimates when cars come in. I would love an update if you get something created with this that works… It would be highly valuable. The possibilities that are emerging from this software are insane.

1 Like




Here are two examples from testing I did a couple weeks ago, it’s not production ready obviously but it’s getting there.