How to send base64 images to Assistant API?

henning_cf · May 15, 2024, 8:45am

Hi,
When I try to encode an image in base64 as a message for the Assistant API with vision capabilities, I get the following error:

Error code: 400 - {'error': {'message': "Invalid 'messages[4].content[1].image_url.url'. Expected a valid URL, but got a value with an invalid format.", 'type': 'invalid_request_error', 'param': 'messages[4].content[1].image_url.url', 'code': 'invalid_value'}}

I encode the image and create a prompt turn like this:

image_bytes = io.BytesIO()
image.save(image_bytes, format=format)
base64_image = base64.b64encode(image_bytes.getvalue()).decode(
        "utf-8")

prompt_turns: list[dict[str, str]] = []
[...]
turn = {
        "role": "user",
        "content": [
            {"type": "text", "text": instruction},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "low"
                },
            },
        ],
prompt_turns.append(turn)

When I create a Thread for the Assistant

thread = oai_client.beta.threads.create(
            messages=prompt_turns
        )

I get the error Expected a valid URL, but got a value with an invalid format from above. This issues arises for gpt-4-turbo and gpt-4o who have image capabilities. With the regular Chat API, I can use the base64 images just fine.
If I use a regular URL it also works as expected.

Does the Assistant API support image input via base64 encoding? If yes, how can one use it? It seems different from the image input for Chat API.

yvoderooij · May 15, 2024, 10:01am

I also want to implement vision capabilities. Got it working from a seperate function but not out of the box via the new gpt-4o …

_j · May 15, 2024, 10:24am

Here’s some pasta functions for you to use for loading and processing an image file, where I had to go back two months to find a version not triple the size with ideas. AI types and hinting.

BASE64 is currently not a vision file method for Assistants, though, only Chat Completions. You would have to save and upload the processed file to file storage and then attach the file ID to a user message.

import base64, textwrap, time, openai, os, io
from PIL import Image
from typing import Tuple

def process_image(path: str, max_size: int) -> Tuple[str, int]:
    """
    Process an image from a given path, encoding it in base64. If the image is a PNG and smaller than max_size,
    it encodes the original. Otherwise, it resizes and converts the image to PNG before encoding.

    Parameters:
        path (str): The file path to the image.
        max_size (int): The maximum width and height allowed for the image.

    Returns:
        Tuple[str, int]: A tuple containing the base64-encoded image and the size of the largest dimension.
    """
    with Image.open(path) as image:
        width, height = image.size
        mimetype = image.get_format_mimetype()
        if mimetype == "image/png" and width <= max_size and height <= max_size:
            with open(path, "rb") as f:
                encoded_image = base64.b64encode(f.read()).decode('utf-8')
                return (encoded_image, max(width, height))
        else:
            resized_image = resize_image(image, max_size)
            png_image = convert_to_png(resized_image)
            return (base64.b64encode(png_image).decode('utf-8'),
                    max(width, height)  # same tuple metadata
                   )

def resize_image(image: Image.Image, max_dimension: int) -> Image.Image:
    """
    Resize a PIL image to ensure that its largest dimension does not exceed max_size.

    Parameters:
        image (Image.Image): The PIL image to resize.
        max_size (int): The maximum size for the largest dimension.

    Returns:
        Image.Image: The resized image.
    """
    width, height = image.size

    # Check if the image has a palette and convert it to true color mode
    if image.mode == "P":
        if "transparency" in image.info:
            image = image.convert("RGBA")
        else:
            image = image.convert("RGB")

    if width > max_dimension or height > max_dimension:
        if width > height:
            new_width = max_dimension
            new_height = int(height * (max_dimension / width))
        else:
            new_height = max_dimension
            new_width = int(width * (max_dimension / height))
        image = image.resize((new_width, new_height), Image.LANCZOS)
        
        timestamp = time.time()

    return image

def convert_to_png(image: Image.Image) -> bytes:
    """
    Convert a PIL Image to PNG format.

    Parameters:
        image (Image.Image): The PIL image to convert.

    Returns:
        bytes: The image in PNG format as a byte array.
    """
    with io.BytesIO() as output:
        image.save(output, format="PNG")
        return output.getvalue()


def create_image_content(image, maxdim, detail_threshold):
    detail = "low" if maxdim < detail_threshold else "high"
    return {
        "type": "image_url",
        "image_url": {"url": f"data:image/png;base64,{image}", "detail": detail}
    }

You can see it does some resizing for you.

The last is a utility for setting the detail parameter based on a comparison of two sizes and creating a single image object for the user message list.

In assistants, there is only a URL method, but you can push the smaller image to your web host for download.

Here’s the method for uploading to the file store, and then attaching to a message for vision.

You can evaluate the speed of waiting for OpenAI storage confirmation that the file is ready and to obtain its ID to then shoot off a run, to the speculative uploading you can do to your web host even with some overlap of the OpenAI model call.

henning_cf · May 15, 2024, 11:11am

Thanks for providing the code snippets!
To summarise your point: it’s recommended to use the file upload and then reference the file_id in the message for the Assistant.
Did you try using your create_image_content method with the Assistant API?
I use similar methods to preprocess and encode the image, but it only works for the Chat API.

_j · May 15, 2024, 11:17am

Here’s 13000+ lines of API specification and examples (from which the reference is created).

github.com

openai/openai-openapi/blob/master/openapi.yaml

openapi: 3.0.0
info:
  title: OpenAI API
  description: The OpenAI REST API. Please see https://platform.openai.com/docs/api-reference for more details.
  version: "2.0.0"
  termsOfService: https://openai.com/policies/terms-of-use
  contact:
    name: OpenAI Support
    url: https://help.openai.com/
  license:
    name: MIT
    url: https://github.com/openai/openai-openapi/blob/master/LICENSE
servers:
  - url: https://api.openai.com/v1
tags:
  - name: Assistants
    description: Build Assistants that can call models and use tools.
  - name: Audio
    description: Learn how to turn audio into text or text into audio.
  - name: Chat

This file has been truncated. show original

A search reveals no such named method on any endpoint, so I don’t know what you could be referring to.

The last little function produces the chat completion image part that can be added to a content list such as this, for multiple items with images:

"messages": [
    {"role": "system",
     "content": [{"type": "text", "text": "Hello robot"}]
    }
]

henning_cf · May 16, 2024, 7:25am

One way to send images to the Chat API is via encoding it to base64 and creating a conversation turn as shown below, with the field {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}.

henning_cf:

image_bytes = io.BytesIO()
image.save(image_bytes, format=format)
base64_image = base64.b64encode(image_bytes.getvalue()).decode(
        "utf-8")

prompt_turns: list[dict[str, str]] = []
[...]
turn = {
        "role": "user",
        "content": [
            {"type": "text", "text": instruction},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                    "detail": "low"
                },
            },
        ],
prompt_turns.append(turn)

However, that doesn’t seem to work for the Assistant API.

Based on your previous comment (not the reference to the API docs), I would assume that file upload is the way for using images in the Assistant API like this

oai_file = oai_client.files.create(file=open(image_path, "rb"), purpose="vision")
turn = {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt.instruction},
                    {"type": "image_file", "image_file": {"file_id": oai_file.id}},
                ],
            }

matipvp02 · May 23, 2024, 11:13pm

I’m having the same issue. In my case, I’m using the Python library. I’m literally replicating the process_image flow posted here, but still, same error.

_j · May 24, 2024, 12:49am

Chat Completions in API reference will show you exactly how to send the BASE64 string, when you click ‘vision’ and ‘python’ on the example code window.

Assistants has no method to send base64 with a message - you either provide a URL, or you upload a file with purpose vision to storage and then use the correct file attachement method for vision.

henning_cf · May 24, 2024, 10:23am

Yes, exactly! Uploading an image file or providing the image URL seems to be the current solution for the Assistant API. For Chat API, the base64 is supported.

IAmJackHarper · May 24, 2024, 11:22am

Same for me, I can’t send base64 images to any model in completions except gpt-4-vision-preview

EDIT:
Solved, it’s a slightly different syntax than what it was before. Try this:

img_str = f"data:image/jpeg;base64,{img_base64}"

response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Any text message"},
                    {"type": "image_url", "image_url": {"url": img_str}}
                ],
            }
        ],
        max_tokens=300,
    )

asankarankutty · May 31, 2024, 8:13pm

Thanks for this! I had used the read() method instead of getvalue() method to obtain the base64 string, leading to an invalid base64 image_url error.

r.ramloll · September 14, 2024, 3:31am

We are now in September and I am still not able to get vision to work in Assistant. I tried base64 encoding, the image interpretations did not seem to have anything to do with the pics. I also tried the URL approach, I tested with images from Wikipedia, that worked reliably every time. Then I tested with images hosted publicly on Google storage bucket via simple direct URLs… and it was so very on and off…unreliable. Has anyone really found a solution that works with Assistants?

r.ramloll · September 14, 2024, 3:35am

Can you clarify how ‘you upload a file with purpose vision to storage and then use the correct file attachement method for vision.’
I have been able to upload a file with purpose vision to storage … but how to use the correct file attachment method?

_j · September 14, 2024, 4:31am

Out of a million plus API users, of those who use assistants and vision, I suspect the majority.

If you want an AI model that will work well in assistants, start with gpt-4-turbo-0125 or gpt-4-turbo-1106 for English only, and only after success should you try cheaper models.

Here’s the full expansion of the API reference showing how to send the contents of a message into a thread in an API request:

Messages

Create messages within threads

Create message

Endpoint

POST https://api.openai.com/v1/threads/{thread_id}/messages

Create a message.

Path parameters

thread_id (string, Required): The ID of the thread to create a message for.

Request Body

role (string, Required): The role of the entity that is creating the message. Allowed values include:
- user: Indicates the message is sent by an actual user and should be used in most cases to represent user-generated messages.
- assistant: Indicates the message is generated by the assistant. Use this value to insert messages from the assistant into the conversation.
content (string or array, Required): The content can be either a simple string or an array of content parts, where each part can be of the following types:
- Text content (string): The text contents of the message.
- Array of content parts (array): An array where each element can be:
  - Text: Pure text elements.
  - Image URL: References an image URL in the content of a message, which must be one of the supported types: jpeg, jpg, png, gif, webp.
    - url (string, Required): The external URL of the image.
    - detail (string, Optional, default: auto): Specifies the detail level of the image, options are low, high, or auto.
  - Image file: References an image file in the content of a message. This is used with files that have been uploaded to the API storage with “purpose” of “vision”.
    - file_id (string, Required): The File ID of the image in the message content. Set purpose="vision" when uploading the File if you need to later display the file content.
    - detail (string, Optional, default: auto): Specifies the detail level of the image, options are low, high, or auto.
attachments (array or null, Optional): A list of files attached to the message, which can be added to tools.
metadata (map, Optional): Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.

Sending images, examples

Here are the extended usage examples showing how to include an image in a message using the OpenAI API, covering both scenarios where the image is referenced via a URL and when it’s uploaded as a file. The examples will be provided in both Python and Node.js for completeness.

Including an Image via URL

Python Example

from openai import OpenAI

client = OpenAI()

thread_message = client.beta.threads.messages.create(
    "thread_abc123",
    role="user",
    content=[
        {
            "type": "text",
            "text": "Here's an image example using a URL."
        },
        {
            "type": "image URL",
            "url": "https://example.com/path_to_image.jpg",
            "detail": "high"  # optional, can be 'low', 'high', or 'auto'
        }
    ]
)

print(thread_message)

Node.js Example

import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
    const threadMessages = await openai.beta.threads.messages.create(
        "thread_abc123",
        {
            role: "user",
            content: [
                {
                    type: "text",
                    text: "Here's an image example using a URL."
                },
                {
                    type: "image URL",
                    url: "https://example.com/path_to_image.jpg",
                    detail: "high"  // optional, can be 'low', 'high', or 'auto'
                }
            ]
        }
    );

    console.log(threadMessages);
}

main();

Including an Image via Uploaded File

Python Example

from openai import OpenAI

client = OpenAI()

thread_message = client.beta.threads.messages.create(
    "thread_abc123",
    role="user",
    content=[
        {
            "type": "text",
            "text": "Here's an image example using an uploaded file."
        },
        {
            "type": "image file",
            "file_id": "file_abc123",
            "detail": "auto"  # optional, can be 'low', 'high', or 'auto'
        }
    ]
)

print(thread_message)

Node.js Example

import OpenAI from "openai";

const openai = new OpenAI();

async function main() {
    const threadMessages = await openai.beta.threads.messages.create(
        "thread_abc123",
        {
            role: "user",
            content: [
                {
                    type: "text",
                    text: "Here's an image example using an uploaded file."
                },
                {
                    type: "image file",
                    file_id: "file_abc123",
                    detail: "auto"  // optional, can be 'low', 'high', or 'auto'
                }
            ]
        }
    );

    console.log(threadMessages);
}

main();

These examples cover both methods of including images in messages using the OpenAI API, both via URLs and uploaded files, with optional detail level specifications.

Uploaded files are not altered or re-encoded when using SDK libraries that do the work of sending from local file.

r.ramloll · September 16, 2024, 12:39am

Thanks Jay. Deeply appreciated.
Ramesh

Moha95 · September 18, 2024, 1:34pm

Hi @_j
Thank you for your work!
Just to clarify, are you uploading the image as a file in this case?

_j · September 18, 2024, 1:49pm

The URL case allows you to specify a web URL, and OpenAI’s API will download it and use it for vision - if the retriever isn’t blocked by the site’s robots.txt policy.

The download site can be part of your own backend, so that when an image is uploaded by user interface, the temporary file is made available at the download location on your own web server.

Or if you allow, an user interface input box could allow pasting some random image URL by a user, such as the example picture link being from some random online store. If you want to show a preview or verify, you’d still need to get it yourself, also.

If there are repeated uses of a URL, such as in providing past messages of a chat, the image might be cached by OpenAI for a bit, so it doesn’t have to be downloaded again.

For assistants, uploading files to the storage to use by file ID can appear faster if you do that uploading while the user is still constructing their message. If, however, you wait until the message and images are completed and someone presses “send”, it will be that many more upload API steps before you can even initiate a run.

Sending file binaries in chat completions messages consumes your network bandwidth and delays the text, but might be faster for single calls than URL.

Moha95 · September 18, 2024, 7:21pm

I see… Sending images isn’t that simple. Thank you for your answers!

r.ramloll · September 18, 2024, 8:47pm

This is the actual code that worked for me and by a lot of trial and error… am sorry to say
Finding the actual structure that OpenAI expects was a nightmare for me. But this works.

// Example: Correctly handling image files with OpenAI Assistant API

const sendMessageWithImage = async (threadId, textContent, imageFileId) => {
const url = https://api.openai.com/v1/threads/${threadId}/messages;
const headers = {
‘Authorization’: Bearer ${process.env.OPENAI_API_KEY},
‘Content-Type’: ‘application/json’,
‘OpenAI-Beta’: ‘assistants=v2’
};

const contentArray = [
{
type: “text”,
text: textContent
}
];

// Correctly add the image file reference if an image file ID is provided
if (imageFileId) {
contentArray.push({
type: “image_file”,
image_file: { file_id: imageFileId }
});
}

const body = JSON.stringify({
role: “user”,
content: contentArray
});

try {
const response = await fetch(url, { method: ‘POST’, headers, body });
const data = await response.json();

if (!response.ok) {
  throw new Error(`API request failed: ${data.error?.message || 'Unknown error'}`);
}

console.log("Message sent successfully:", data);
return data;

} catch (error) {
console.error(“Error sending message:”, error);
throw error;
}
};

// Usage example
sendMessageWithImage(
“thread_abc123”,
“Please analyze this X-ray image.”,
“file_xyz789”
).then(result => {
console.log(“Message sent with image. API response:”, result);
}).catch(error => {
console.error(“Failed to send message:”, error);
});

Topic		Replies	Views
Assistant Thread Message File Upload API java	10	386	March 5, 2025
Can Assistants API understand image files uploaded? API	11	11456	September 28, 2024
Which is correct model for image analysis? API	5	280	December 9, 2024
Using image URL in images/edits request API dalle2	54	23311	February 6, 2024
Assistant api, retrieval file api is not working Bugs api	44	15470	March 13, 2024

How to send base64 images to Assistant API?

Messages

Create message

Endpoint

Path parameters

Request Body

Sending images, examples

Including an Image via URL

Python Example

Node.js Example

Including an Image via Uploaded File

Python Example

Node.js Example

Related topics