API 4o Assistant thread unable to take base64 image as input

I am having an issue where the assistant is unable to take base64 image as input. It seems more like a syntax error, because I am using image_url to pass the image. Using image_url for images in buffer seems to be the only method. This syntax works with the client but not assistant, could anyone help me out? What is the correct way to do this?

thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is the difference between these images?"
        },
        {
          "type": "image_url",
          "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
        },
        {
          "type": "image_url",
          "image_url": {"url": "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"}
        },
      ],
    }
  ]
)

The error is

openai.BadRequestError: Error code: 400 - {ā€˜errorā€™: {ā€˜messageā€™: ā€œInvalid ā€˜messages[0].content[1].image_url.urlā€™. Expected a valid URL, but got a value with an invalid format.ā€, ā€˜typeā€™: ā€˜invalid_request_errorā€™, ā€˜paramā€™: ā€˜messages[0].content[1].image_url.urlā€™, ā€˜codeā€™: ā€˜invalid_valueā€™}}

The assistant thread is unable to take base64 as input.

True.

The method to provide an image file is to:

  1. upload it to the files endpoint, with purpose ā€˜assistantsā€™, and receive the file ID.

  2. use file ID with the image_file object within the user content, which can be seen expanding the API reference.

I hope this gives a path to success and happiness.

1 Like

Thank you for putting me on the path to success and happiness.

Do you know if there is a more optimal way for inputting images? I am looking to have the assistant view about 150 images at once. I know itā€™s a lot but this is something I was able to do with client chat. I am trying assistant because I want it to have some uploaded knowledge.

Thereā€™s a more optimal way of inputting knowledge than images: text.

Instead of running images as input over and over again, just perform a task on each of extracting the text within for use. (I assume these are perhaps PDF pages or slides).

That will give clarity and less distraction from the task at hand. Then you can also use the file search on that data when uploaded as a text file.


Upload Optimization:

Unfortunately, while the files endpoint uses multipart/form-data, which can support multiple files to streamline network communications, the API is glad to ignore multiples being sent, and the return object doesnā€™t have a way to present multiple IDs.

You can upload in parallel, though.