GPT-4-Turbo not responding to image url input

I am getting an error msg with the gpt-4-turbo, for describing an image that I am sending through URL in the API. The image size is 6.9 MB, and the max allowed limit is 20 MB.

Code -

imgLink = "https://benny-image-shop.s3.ap-south-1.amazonaws.com/2024-04-10/1712746302215000_103.215.237.66.jpg"

payload =  {"model": "gpt-4-turbo",
        "messages": [
        {
            "role": "user",
            "content": [
            {
                "type": "text",
                "text": "describe"
            },
            {
                "type": "image_url",
                "image_url": {
                "url": imgLink
                }
            }
            ]
        }
        ],
        "max_tokens": 500
    }

    headers = {"Authorization": f"Bearer " + api_key,
                "Content-Type": "application/json"}

    response = requests.post('https://api.openai.com/v1/chat/completions', headers=headers, json=payload)

Error - {‘error’: {‘message’: ‘Invalid image.’, ‘type’: ‘invalid_request_error’, ‘param’: None, ‘code’: None}}

The image is taken sideways, and then has an EXIF metadata to rotate the JPG. You can see it load from right to left because of this.

Since OpenAI calls your image “invalid” on both vision models, but works with other online images, I give you a downloader and resizer that will send the image itself instead of a URL.

If you specify a file location on your system instead of a URL, it will use that instead.

import base64
import urllib.request
from PIL import Image
from io import BytesIO
from openai import OpenAI

client = OpenAI()

def retrieve_image(input_string):
    if input_string.startswith("http"):
        req = urllib.request.Request(input_string, headers={'User-Agent': 'Mozilla/5.0'})
        with urllib.request.urlopen(req) as response:
            image = Image.open(BytesIO(response.read()))
    else:
        with open(input_string, "rb") as image_file:
            image = Image.open(image_file)

    max_size = 512
    width, height = image.size
    if max(width, height) > max_size:
        aspect_ratio = float(width) / float(height)
        if width > height:
            new_width = max_size
            new_height = int(new_width / aspect_ratio)
        else:
            new_height = max_size
            new_width = int(new_height * aspect_ratio)
        image = image.resize((new_width, new_height), Image.LANCZOS)

    buffered = BytesIO()
    image.save(buffered, format="PNG")
    return base64.b64encode(buffered.getvalue()).decode("utf-8")

base64_image = retrieve_image("https://benny-image-shop.s3.ap-south-1.amazonaws.com/2024-04-10/1712746302215000_103.215.237.66.jpg")
parameters = {
    "model": "gpt-4-turbo",
    "max_tokens": 500,
    "messages": [{"role": "system", "content": "You are Extracto, a computer vision assistant."},
        {
            "role": "user",
            "content": [
                """
                Image should be attached. Describe the contents.
                """.strip(),
                {   "image": base64_image
                   # "image_url": "https://benny-image-shop.s3.ap-south-1.amazonaws.com/2024-04-10/1712746302215000_103.215.237.66.jpg"
                   # "image_url": "https://i.imgur.com/C2Bvncv.png"

                }
            ]
        }
    ]
}

cc = client.chat.completions.create(**parameters)
print(cc.choices[0].message.content)

You can use detail:low as a image API parameter to pay less, or increase the max_size up to 2048 to pay a lot. URL-style images are commented out.

Both the openai library and PIL must be installed to your python with “pip install xxx”

Thanks a lot for the reply. Will try this and update.

If I don’t specify any detail parameter to the API request, will it handle it itself in auto mode, according to the image?

This particular alternate “image” message format only accepts base64 that you’ve made a reasonable size yourself, and doesn’t have a detail parameter.

You can read the API reference, vision documents, and change the entire message format to use the more complex image_url format (and its base64 sending alternative) if you want to employ the “tiled” method for more costly high detail.

Thank you for the clarification! This helps.