Different errors when inputting an image into gpt-4-vision-preview

Hello,
I’m trying to build a very simple program where I send gpt-4-vision-preview a photo of a veryt short text document and ask it to return the transcript. I’m getting <response 400> as an answer when trying with the original jpg file (1.9mb) encoded locally and passed as base64, <response 200> when downsizing to 129kb, and “TypeError: Object of type set is not JSON serializable” when passing the image directly via link. This is what I’m coding below. What am I doing wrong?

image_path = “/Users/myname/Documents/programname/folder1/imagename.jpg”

completion0 = openai.chat.completions.create(
model=“gpt-4-vision-preview”,
messages=[
{
“role”: “user”,
“content”: [
{“type”: “text”, “text”: “Please create a list of all the items on this document, and their associated categories. Output the list only, no accompanying notes or acknowledgements”},
{
“type”: “image_url”,
“image_url”: {
“url”: {image_path},
},
},
],
}
],
max_tokens=300,
)

print(completion0.choices[0].message.content)

It looks like you’re passing the image path directly without any base64 encoding?
I could get your code to work on my PC after a quick change to add that image encoding (used the vision documentation as a guide):

def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

image_path = "test.png"
base64_image = encode_image(image_path)

completion0 = client.chat.completions.create(
    model="gpt-4-vision-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Please create a list of all the items on this document, and their associated categories. Output the list only, no accompanying notes or acknowledgements",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    },
                },
            ],
        }
    ],
    max_tokens=300,
)

print(completion0.choices[0].message.content)

be sure to use the base64 and openai packages

1 Like