Evaluating the Impact of File Formats on Image Description Quality: PNG vs. JPG

I want to determine if there are differences in the quality of image descriptions when using PNG vs. JPG files. Here are my thoughts:

  1. A file size of up to 20MB is allowed. JPG files use much less storage for the same picture. However, if you need a really good description of something complex, you might prefer a PNG file as a human because you can see more details when zooming in closely.
  2. If up to 20MB is allowed, will GPT-4o use the entire size? Or is this limit just for compatibility? Is the file being downsized to the maximum size it can use?
  3. Are there differences in the capability to analyze files between JPG and PNG? Could JPG compression confuse GPT-4o? Or will it be able to see more because of the compression?

These considerations also apply to TIFF and any other supported formats.

1 Like

With respect to your overall question… Almost certainly not.

I would imagine that what is happening under the hood is any image you upload is resized and re-encoded to some form of bitmap data which is basically just a long vector of values representing the RGB pixel values.

I could be wrong—I have no inside information here—but I’ve not seen anything to suggest they are piping raw image data directly into the model.

If anyone has seen anything suggesting otherwise I’d love to read it.

so there is also no difference if i upload 20MB or the recomendet max size?

Again, I can’t definitively say (I’ve not personally done enough exploration into it myself) but I would not expect it to make much, if any, difference.

I would encourage you to pay with it though, see what results you get, and report back.

Results:

I could notice a slight difference, which I attribute more to the generally lower quality of JPG rather than the processing of the images. However, this effect was significant stronger at the low detail level than at the high detail level.

My test involved rendering text that was available in the same version as both JPG and PNG. At the low detail level, there was a stronger tendency to misrecognize or omit words when the image was transmitted as a JPG.

1 Like

Thanks for coming back to share your findings.