How to send base64 images to Assistant API?

Hi,
When I try to encode an image in base64 as a message for the Assistant API with vision capabilities, I get the following error:

Error code: 400 - {'error': {'message': "Invalid 'messages[4].content[1].image_url.url'. Expected a valid URL, but got a value with an invalid format.", 'type': 'invalid_request_error', 'param': 'messages[4].content[1].image_url.url', 'code': 'invalid_value'}}

I encode the image and create a prompt turn like this:

image_bytes = io.BytesIO()
image.save(image_bytes, format=format)
base64_image = base64.b64encode(image_bytes.getvalue()).decode(
        "utf-8")

prompt_turns: list[dict[str, str]] = []
[...]
turn = {
        "role": "user",
        "content": [
            {"type": "text", "text": instruction},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "low"
                },
            },
        ],
prompt_turns.append(turn)

When I create a Thread for the Assistant

thread = oai_client.beta.threads.create(
            messages=prompt_turns
        )

I get the error Expected a valid URL, but got a value with an invalid format from above. This issues arises for gpt-4-turbo and gpt-4o who have image capabilities. With the regular Chat API, I can use the base64 images just fine.
If I use a regular URL it also works as expected.

Does the Assistant API support image input via base64 encoding? If yes, how can one use it? It seems different from the image input for Chat API.

I also want to implement vision capabilities. Got it working from a seperate function but not out of the box via the new gpt-4o ā€¦

Hereā€™s some pasta functions for you to use for loading and processing an image file, where I had to go back two months to find a version not triple the size with ideas. AI types and hinting.

BASE64 is currently not a vision file method for Assistants, though, only Chat Completions. You would have to save and upload the processed file to file storage and then attach the file ID to a user message.

import base64, textwrap, time, openai, os, io
from PIL import Image
from typing import Tuple

def process_image(path: str, max_size: int) -> Tuple[str, int]:
    """
    Process an image from a given path, encoding it in base64. If the image is a PNG and smaller than max_size,
    it encodes the original. Otherwise, it resizes and converts the image to PNG before encoding.

    Parameters:
        path (str): The file path to the image.
        max_size (int): The maximum width and height allowed for the image.

    Returns:
        Tuple[str, int]: A tuple containing the base64-encoded image and the size of the largest dimension.
    """
    with Image.open(path) as image:
        width, height = image.size
        mimetype = image.get_format_mimetype()
        if mimetype == "image/png" and width <= max_size and height <= max_size:
            with open(path, "rb") as f:
                encoded_image = base64.b64encode(f.read()).decode('utf-8')
                return (encoded_image, max(width, height))
        else:
            resized_image = resize_image(image, max_size)
            png_image = convert_to_png(resized_image)
            return (base64.b64encode(png_image).decode('utf-8'),
                    max(width, height)  # same tuple metadata
                   )

def resize_image(image: Image.Image, max_dimension: int) -> Image.Image:
    """
    Resize a PIL image to ensure that its largest dimension does not exceed max_size.

    Parameters:
        image (Image.Image): The PIL image to resize.
        max_size (int): The maximum size for the largest dimension.

    Returns:
        Image.Image: The resized image.
    """
    width, height = image.size

    # Check if the image has a palette and convert it to true color mode
    if image.mode == "P":
        if "transparency" in image.info:
            image = image.convert("RGBA")
        else:
            image = image.convert("RGB")

    if width > max_dimension or height > max_dimension:
        if width > height:
            new_width = max_dimension
            new_height = int(height * (max_dimension / width))
        else:
            new_height = max_dimension
            new_width = int(width * (max_dimension / height))
        image = image.resize((new_width, new_height), Image.LANCZOS)
        
        timestamp = time.time()

    return image

def convert_to_png(image: Image.Image) -> bytes:
    """
    Convert a PIL Image to PNG format.

    Parameters:
        image (Image.Image): The PIL image to convert.

    Returns:
        bytes: The image in PNG format as a byte array.
    """
    with io.BytesIO() as output:
        image.save(output, format="PNG")
        return output.getvalue()


def create_image_content(image, maxdim, detail_threshold):
    detail = "low" if maxdim < detail_threshold else "high"
    return {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{image}", "detail": detail}
    }

You can see it does some resizing for you.

The last is a utility for setting the detail parameter based on a comparison of two sizes and creating a single image object for the user message list.

In assistants, there is only a URL method, but you can push the smaller image to your web host for download.


Hereā€™s the method for uploading to the file store, and then attaching to a message for vision.

You can evaluate the speed of waiting for OpenAI storage confirmation that the file is ready and to obtain its ID to then shoot off a run, to the speculative uploading you can do to your web host even with some overlap of the OpenAI model call.

1 Like

Thanks for providing the code snippets!
To summarise your point: itā€™s recommended to use the file upload and then reference the file_id in the message for the Assistant.
Did you try using your create_image_content method with the Assistant API?
I use similar methods to preprocess and encode the image, but it only works for the Chat API.

Hereā€™s 13000+ lines of API specification and examples (from which the reference is created).

A search reveals no such named method on any endpoint, so I donā€™t know what you could be referring to.

The last little function produces the chat completion image part that can be added to a content list such as this, for multiple items with images:

"messages": [
    {"role": "system",
     "content": [{"type": "text", "text": "Hello robot"}]
    }
]

One way to send images to the Chat API is via encoding it to base64 and creating a conversation turn as shown below, with the field {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}.

However, that doesnā€™t seem to work for the Assistant API.

Based on your previous comment (not the reference to the API docs), I would assume that file upload is the way for using images in the Assistant API like this

oai_file = oai_client.files.create(file=open(image_path, "rb"), purpose="vision")
turn = {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt.instruction},
                    {"type": "image_file", "image_file": {"file_id": oai_file.id}},
                ],
            }

3 Likes

Iā€™m having the same issue. In my case, Iā€™m using the Python library. Iā€™m literally replicating the process_image flow posted here, but still, same error.

Chat Completions in API reference will show you exactly how to send the BASE64 string, when you click ā€˜visionā€™ and ā€˜pythonā€™ on the example code window.

Assistants has no method to send base64 with a message - you either provide a URL, or you upload a file with purpose vision to storage and then use the correct file attachement method for vision.

2 Likes

Yes, exactly! Uploading an image file or providing the image URL seems to be the current solution for the Assistant API. For Chat API, the base64 is supported.

Same for me, I canā€™t send base64 images to any model in completions except gpt-4-vision-preview

EDIT:
Solved, itā€™s a slightly different syntax than what it was before. Try this:

img_str = f"data:image/jpeg;base64,{img_base64}"

response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Any text message"},
                    {"type": "image_url", "image_url": {"url": img_str}}
                ],
            }
        ],
        max_tokens=300,
    )

Thanks for this! I had used the read() method instead of getvalue() method to obtain the base64 string, leading to an invalid base64 image_url error.

We are now in September and I am still not able to get vision to work in Assistant. I tried base64 encoding, the image interpretations did not seem to have anything to do with the pics. I also tried the URL approach, I tested with images from Wikipedia, that worked reliably every time. Then I tested with images hosted publicly on Google storage bucket via simple direct URLsā€¦ and it was so very on and offā€¦unreliable. Has anyone really found a solution that works with Assistants?

Can you clarify how ā€˜you upload a file with purpose vision to storage and then use the correct file attachement method for vision.ā€™
I have been able to upload a file with purpose vision to storage ā€¦ but how to use the correct file attachment method?

Out of a million plus API users, of those who use assistants and vision, I suspect the majority.

If you want an AI model that will work well in assistants, start with gpt-4-turbo-0125 or gpt-4-turbo-1106 for English only, and only after success should you try cheaper models.

Hereā€™s the full expansion of the API reference showing how to send the contents of a message into a thread in an API request:


Messages

Create messages within threads

Create message

Endpoint

POST https://api.openai.com/v1/threads/{thread_id}/messages

Create a message.

Path parameters

  • thread_id (string, Required): The ID of the thread to create a message for.

Request Body

  • role (string, Required): The role of the entity that is creating the message. Allowed values include:

    • user: Indicates the message is sent by an actual user and should be used in most cases to represent user-generated messages.
    • assistant: Indicates the message is generated by the assistant. Use this value to insert messages from the assistant into the conversation.
  • content (string or array, Required): The content can be either a simple string or an array of content parts, where each part can be of the following types:

    • Text content (string): The text contents of the message.
    • Array of content parts (array): An array where each element can be:
      • Text: Pure text elements.
      • Image URL: References an image URL in the content of a message, which must be one of the supported types: jpeg, jpg, png, gif, webp.
        • url (string, Required): The external URL of the image.
        • detail (string, Optional, default: auto): Specifies the detail level of the image, options are low, high, or auto.
      • Image file: References an image file in the content of a message. This is used with files that have been uploaded to the API storage with ā€œpurposeā€ of ā€œvisionā€.
        • file_id (string, Required): The File ID of the image in the message content. Set purpose="vision" when uploading the File if you need to later display the file content.
        • detail (string, Optional, default: auto): Specifies the detail level of the image, options are low, high, or auto.
  • attachments (array or null, Optional): A list of files attached to the message, which can be added to tools.

  • metadata (map, Optional): Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.


Sending images, examples

Here are the extended usage examples showing how to include an image in a message using the OpenAI API, covering both scenarios where the image is referenced via a URL and when itā€™s uploaded as a file. The examples will be provided in both Python and Node.js for completeness.

Including an Image via URL

Python Example

from openai import OpenAI

client = OpenAI()

thread_message = client.beta.threads.messages.create(
    "thread_abc123",
    role="user",
    content=[
        {
            "type": "text",
            "text": "Here's an image example using a URL."
        },
        {
            "type": "image URL",
            "url": "https://example.com/path_to_image.jpg",
            "detail": "high"  # optional, can be 'low', 'high', or 'auto'
        }
    ]
)

print(thread_message)

Node.js Example

import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
    const threadMessages = await openai.beta.threads.messages.create(
        "thread_abc123",
        {
            role: "user",
            content: [
                {
                    type: "text",
                    text: "Here's an image example using a URL."
                },
                {
                    type: "image URL",
                    url: "https://example.com/path_to_image.jpg",
                    detail: "high"  // optional, can be 'low', 'high', or 'auto'
                }
            ]
        }
    );

    console.log(threadMessages);
}

main();

Including an Image via Uploaded File

Python Example

from openai import OpenAI

client = OpenAI()

thread_message = client.beta.threads.messages.create(
    "thread_abc123",
    role="user",
    content=[
        {
            "type": "text",
            "text": "Here's an image example using an uploaded file."
        },
        {
            "type": "image file",
            "file_id": "file_abc123",
            "detail": "auto"  # optional, can be 'low', 'high', or 'auto'
        }
    ]
)

print(thread_message)

Node.js Example

import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
    const threadMessages = await openai.beta.threads.messages.create(
        "thread_abc123",
        {
            role: "user",
            content: [
                {
                    type: "text",
                    text: "Here's an image example using an uploaded file."
                },
                {
                    type: "image file",
                    file_id: "file_abc123",
                    detail: "auto"  // optional, can be 'low', 'high', or 'auto'
                }
            ]
        }
    );

    console.log(threadMessages);
}

main();

These examples cover both methods of including images in messages using the OpenAI API, both via URLs and uploaded files, with optional detail level specifications.

Uploaded files are not altered or re-encoded when using SDK libraries that do the work of sending from local file.

1 Like

Thanks Jay. Deeply appreciated.
Ramesh