How to send base64 images to Assistant API?

Hi,
When I try to encode an image in base64 as a message for the Assistant API with vision capabilities, I get the following error:

Error code: 400 - {'error': {'message': "Invalid 'messages[4].content[1].image_url.url'. Expected a valid URL, but got a value with an invalid format.", 'type': 'invalid_request_error', 'param': 'messages[4].content[1].image_url.url', 'code': 'invalid_value'}}

I encode the image and create a prompt turn like this:

image_bytes = io.BytesIO()
image.save(image_bytes, format=format)
base64_image = base64.b64encode(image_bytes.getvalue()).decode(
        "utf-8")

prompt_turns: list[dict[str, str]] = []
[...]
turn = {
        "role": "user",
        "content": [
            {"type": "text", "text": instruction},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "low"
                },
            },
        ],
prompt_turns.append(turn)

When I create a Thread for the Assistant

thread = oai_client.beta.threads.create(
            messages=prompt_turns
        )

I get the error Expected a valid URL, but got a value with an invalid format from above. This issues arises for gpt-4-turbo and gpt-4o who have image capabilities. With the regular Chat API, I can use the base64 images just fine.
If I use a regular URL it also works as expected.

Does the Assistant API support image input via base64 encoding? If yes, how can one use it? It seems different from the image input for Chat API.

I also want to implement vision capabilities. Got it working from a seperate function but not out of the box via the new gpt-4o …

Here’s some pasta functions for you to use for loading and processing an image file, where I had to go back two months to find a version not triple the size with ideas. AI types and hinting.

import base64, textwrap, time, openai, os, io
from PIL import Image
from typing import Tuple

def process_image(path: str, max_size: int) -> Tuple[str, int]:
    """
    Process an image from a given path, encoding it in base64. If the image is a PNG and smaller than max_size,
    it encodes the original. Otherwise, it resizes and converts the image to PNG before encoding.

    Parameters:
        path (str): The file path to the image.
        max_size (int): The maximum width and height allowed for the image.

    Returns:
        Tuple[str, int]: A tuple containing the base64-encoded image and the size of the largest dimension.
    """
    with Image.open(path) as image:
        width, height = image.size
        mimetype = image.get_format_mimetype()
        if mimetype == "image/png" and width <= max_size and height <= max_size:
            with open(path, "rb") as f:
                encoded_image = base64.b64encode(f.read()).decode('utf-8')
                return (encoded_image, max(width, height))
        else:
            resized_image = resize_image(image, max_size)
            png_image = convert_to_png(resized_image)
            return (base64.b64encode(png_image).decode('utf-8'),
                    max(width, height)  # same tuple metadata
                   )

def resize_image(image: Image.Image, max_dimension: int) -> Image.Image:
    """
    Resize a PIL image to ensure that its largest dimension does not exceed max_size.

    Parameters:
        image (Image.Image): The PIL image to resize.
        max_size (int): The maximum size for the largest dimension.

    Returns:
        Image.Image: The resized image.
    """
    width, height = image.size

    # Check if the image has a palette and convert it to true color mode
    if image.mode == "P":
        if "transparency" in image.info:
            image = image.convert("RGBA")
        else:
            image = image.convert("RGB")

    if width > max_dimension or height > max_dimension:
        if width > height:
            new_width = max_dimension
            new_height = int(height * (max_dimension / width))
        else:
            new_height = max_dimension
            new_width = int(width * (max_dimension / height))
        image = image.resize((new_width, new_height), Image.LANCZOS)
        
        timestamp = time.time()

    return image

def convert_to_png(image: Image.Image) -> bytes:
    """
    Convert a PIL Image to PNG format.

    Parameters:
        image (Image.Image): The PIL image to convert.

    Returns:
        bytes: The image in PNG format as a byte array.
    """
    with io.BytesIO() as output:
        image.save(output, format="PNG")
        return output.getvalue()


def create_image_content(image, maxdim, detail_threshold):
    detail = "low" if maxdim < detail_threshold else "high"
    return {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{image}", "detail": detail}
    }

You can see it does some resizing for you.

The last is a utility for setting the detail parameter based on a comparison of two sizes and creating a single image object for the user message list.

In assistants, there is only a URL method, but you can push the smaller image to your web host for download.


Here’s the method for uploading to the file store, and then attaching to a message for vision.

You can evaluate the speed of waiting for OpenAI storage confirmation that the file is ready and to obtain its ID to then shoot off a run, to the speculative uploading you can do to your web host even with some overlap of the OpenAI model call.

Thanks for providing the code snippets!
To summarise your point: it’s recommended to use the file upload and then reference the file_id in the message for the Assistant.
Did you try using your create_image_content method with the Assistant API?
I use similar methods to preprocess and encode the image, but it only works for the Chat API.

Here’s 13000+ lines of API specification and examples (from which the reference is created).

A search reveals no such named method on any endpoint, so I don’t know what you could be referring to.

The last little function produces the chat completion image part that can be added to a content list such as this, for multiple items with images:

"messages": [
    {"role": "system",
     "content": [{"type": "text", "text": "Hello robot"}]
    }
]

One way to send images to the Chat API is via encoding it to base64 and creating a conversation turn as shown below, with the field {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}.

However, that doesn’t seem to work for the Assistant API.

Based on your previous comment (not the reference to the API docs), I would assume that file upload is the way for using images in the Assistant API like this

oai_file = oai_client.files.create(file=open(image_path, "rb"), purpose="vision")
turn = {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt.instruction},
                    {"type": "image_file", "image_file": {"file_id": oai_file.id}},
                ],
            }

I’m having the same issue. In my case, I’m using the Python library. I’m literally replicating the process_image flow posted here, but still, same error.

Chat Completions in API reference will show you exactly how to send the BASE64 string, when you click ‘vision’ and ‘python’ on the example code window.

Assistants has no method to send base64 with a message - you either provide a URL, or you upload a file with purpose vision to storage and then use the correct file attachement method for vision.