Moving from gpt-4-vision-preview to gpt-4o Image URL Base64

jjorgensen · September 11, 2024, 4:25pm

I am trying to convert over my API code from using gpt-4-vision-preview to gpt-4o. I am passing a base64 string in as image_url. It works no problem with the model set to gpt-4-vision-preview but changing just the model to gpt-4o gives an error that gpt-4o requires image_url to be a link to an image. But according to the documentation it should work with a base64 string. I have tried gpt-4o-mini also. It only seems to work with gpt-4-vision-preview. Is there something I should be doing differently with gpt-4o?

Image of documentation:

My code is:
const mediaHistory = [
{
role: “user”,
content: [
{
type: “text”,
text: frameInstructions
},
{
type: “image_url”,
image_url: data:image/jpeg;base64,${imageBase64}
}
]
},
];

const messageHistory = [
{
role: “system”,
content: [
{
type: “text”,
text: instructions
}
]
},
{
role: “user”,
content: [
{
type: “text”,
text: chatText
}
]
}
];
//console.log(mediaHistory);
const opts = {
model: “gpt-4o”,
max_tokens: 300,
messages: […mediaHistory, …messageHistory]
};
const response = await openai.chat.completions.create(opts);

r.ramloll · September 11, 2024, 4:54pm

undocumented Correct Format for Base64 Images
The main issue developers face is using the correct format when sending base64-encoded images to the API. The solution is to structure the image data as follows:
json
{
“type”: “image_url”,
“image_url”: {
“url”: “data:image/jpeg;base64,<base64_encoded_image_data>”
}
}

Key points:
Use “type”: “image_url” instead of “type”: “image”
Include the full data URI scheme, including the MIME type (e.g., “data:image/jpeg;base64,”)

BUT … the encoding that is sent is almost always misinterpreted.

Makes me think about the importance of dev communities, even the web based OpenAI community has nothing on this… ridiculous… does anybody know which OpenAI community has the most traffic?

_j · September 11, 2024, 5:14pm

Here is self-documenting code. I give it notebook-like, just to keep you busy copy-pasting.

Use the python “client” API SDK method, and a system role message

from openai import OpenAI
client = OpenAI()

system_message = [
  {
    "role": "system",
    "content": [
      {
        "type": "text",
        "text": "You are ImageAI, with built in computer vision."
      }
    ]
  }
]

I’ll give you example base64 images so you can run immediately.


pngpre = 'iVBORw0KGgoAAAANSUhEUgAAAIAAAABACAMAAADlCI9NAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAAZQTFRF////'
example_images = [
'MzMzOFSMkQAAAPJJREFUeNrslm0PwjAIhHv//09rYqZADzOBqMnu+WLTruOGvK0lhBBCCPHH4E7x3pwAfFE4tX9lAUBVwZyAYjwFAeikgH3XYxn88nzKbIZly4/BluUlIG66RVXBcYd9TTQWN+1vWUEqIJQI5nqYP6scl84UqUtEoLNMjoqBzFYrt+IF1FOTfGsqIIlcgAbNZ0Uoxtu6igB+tyBgZhCgAZ8KyI46zYQF/LksQC0L3gigdQBhgGkXou1hF1XebKzKXBxaDsjCOu1Q/LA1U+Joelt/9d2QVm9MjmibO2mGTEy2ZyetsbdLgAQIIYQQQoifcRNgAIfGAzQQHmwIAAAAAElFTkSuQmCC',
'AAAAVcLTfgAAAPRJREFUeNrsllEKwzAMQ+37X3owBm0c2VZCIYXpfXVBTd9qx5uZEEIIIcQr8IHjAgcc/LTBGwSiz5sEoIwTKwuxVCAW5XsxFco3Y63A3BawVWDMiFgiMD5tvELNuh/r5sA9Nu1yiYaXvBBLBawUAGubsZU5UOy8HkNvINoAv27nMVZ1WC1wfwrspPk2FDMiVpYknNu6uIxAVWQsgBoSCCQxI2KEANFdXccXseZzuKMQQDFmt6pPwU9CL+CcADEJr6qFA1aWYIgZEesGEVgmTsGvfYyIdaPYwp6JwBRL5kD4Hs7+VWGSz8aEEEIIIYQQ/8VHgAEAxPsD+SYeZ2QAAAAASUVORK5CYII=',
'AAAAVcLTfgAAAPVJREFUeNrslsEOhCAMRNv//+nNbtYInRELoniYdyJC2hdsATMhhBBCiFfiG4vTT1XIx/LA0wJl0hUCIeU8g2QgSBiFelJOFoCq+I3+H8ox6aN8SeGK7QvW5XfghcA+B0WcFvBDgToWbEmVANvoigBO1AIGY6N9lKuBlgAsClJ0bLME2CKaB1Kx1RcEQmWxHfK7BFhpPyHAOus+AVxW9lG7BqYJ+IHAWRHajCKE+6/YgB6B4TaMBk4EPCPgwwIG5yfEOROIp3XvxU4fRO74UGr/d3J3pt837OqAm6cl0IrQ8zAcOacbERa+s4UQQgghhBBv5iPAAA3BAvjyKYgWAAAAAElFTkSuQmCC',
]
example_images = [pngpre + s for s in example_images]

Construct a detailed multi-image user message, with metadata description of the image to follow. This is where the challenge was had.

user_tiled_image_message = [
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "Produce a per-image report of each image's contents."
      },
      {
        "type": "text",
        "text": "1. image filename example1.png:"
      },
      {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{example_images[0]}", "detail": "low"}
      },
      {
        "type": "text",
        "text": "2. image filename example2.png:"
      },      {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{example_images[1]}", "detail": "high"}
      }
    ]
  }
]

Then send it off:
(the two messages are already in a list, so lists can just be “added”.

response = client.chat.completions.with_raw_response.create(
  model="gpt-4o-2024-08-06", max_tokens=500, top_p=0.01,
  messages=system_message + user_tiled_image_message,
)
print(response.http_response.json()["choices"][0]["message"]["content"])
response.http_response.json()["usage"]
print(f"time: {response.elapsed.total_seconds():.2f}s")

This is documented in the API reference, but you’ve gotta expand the user message format, and expand, and expand…

gpt-4-1106-vision-preview supports even another undocumented and useful image method, where an image is not tiled, nor is it resized down (among other things its API alone will accept).

Quality difference of AI from same input

=============== gpt-4-1106-vision-preview ===============

Image Content Report

1. Image Filename: example1.png

Content Description: The image contains the word “Apple” in a simple, pixelated black font on a white background.
Text Analysis: The text is clear and legible, styled in a basic sans-serif typeface.

2. Image Filename: example2.png

Content Description: The image displays the word “Banana” in a pixelated black font on a white background.
Text Analysis: The text is straightforward and readable, presented in a plain sans-serif font.
Resolution: 64x128 pixels

Both images are text-based with no additional graphical elements, focusing solely on the representation of the words “Apple” and “Banana” respectively.
time: 6.43s

=============== gpt-4o-2024-08-06 ===============
Image Report

Image Filename: example1.png
- Contents: The image contains the text “Apple”.
Image Filename: example2.png
- Contents: The image contains the text “Banana”.
  time: 3.14s

Topic		Replies	Views
Can GPT -vision models be accessed using API? API	15	1176	January 7, 2025
The latest API model, chatgpt-4o-latest, only allows image recognition using base64? API bug	4	445	August 15, 2024
The performance difference between ChatGPT4o and gpt4o api using the same prompt for image analysis API gpt-4 , chatgpt , gpt-4-vision , gpt4-vision , api-vision	5	950	July 27, 2024
Image_url is only supported by certain models Bugs api	24	3037	February 18, 2025
How to extract text from images using API? API gpt-4	2	356	January 31, 2025

Moving from gpt-4-vision-preview to gpt-4o Image URL Base64

Image Content Report

Related topics