Image upload in Chat Completions, Responses and Assistants

Hi there,

Chat Completions

I initially used the chat completions endpoint to send three b64 encoded images alongside a prompt to o1. That worked fine for 93 - 97% of the cases, but I got a few requests where o1 responded with “Your image was empty”.

# Prepare the messages payload
                        messages_payload = [
                            {
                                "role": "system",
                                "content": "You are interpreting the images.",
                            },
                            {
                                "role": "user",
                                "content": [
                                    {
                                        "type": "text",
                                        "text": prompt,
                                    },
                                    {
                                        "type": "image_url",
                                        "image_url": {"url": f"data:image/png;base64,{img_question_b64}"},
                                    },
                                    {
                                        "type": "image_url",
                                        "image_url": {"url": f"data:image/png;base64,{img_solution_b64}"},
                                    },
                                    {
                                        "type": "image_url",
                                        "image_url": {"url": f"data:image/png;base64,{img_b64}"},
                                    },
                                ],
                            },
                        ]
                        
                        # Call the OpenAI API
                        response = await asyncio.to_thread(
                            openai_client.chat.completions.create,
                            model=model_name,
                            messages=messages_payload,

I have tried

  • to make the requests synchronously
  • reduce the size (both height/widht and file size) of the image
  • several runs
  • decoding the b64 image to see whether it was encoded correctly
    The error reappeared over and over. It does not appear to happen with the largest images and I also not hitting any rate limits. I do have, 2-3 same cases that repeatedly result in a response of the type “The images are empty”, plus sometimes some more.

What I haven’t tried yet is to remove exif data, as suggested in this post:
image-upload-for-analysis-fails-randomly-help/889007

Responses

I then moved to the responses endpoint, and tried uploading the images separately and use file ids. Which does not seem to work, as files for the responses endpoint need to be pdfs or at least cannot be images. One solution could be to convert my images to pdfs to meet that requirement. I am a bit surprised though that the responses endpoint which is supposed to replace assistant lags behind the capabilites of assistants, which can accept images as file ids.

Assistants

I am hence now trying the assistants endpoint:

abc.filename
Out[57]: '004.png'

abc.id
Out[58]: 'file-***'

question_file_id = abc.id

solution_file_id = abc.id

student_file_id = abc.id

thread_message = openai_client.beta.threads.messages.create(
  empty_thread.id,
  role="user",
  content=content,
)

BadRequestError: Error code: 400 - {'error': {'message': 'Invalid message content: Expected file type to be a supported format: .jpeg, .jpg, .png, .gif, .webp but got none.', 'type': 'invalid_request_error', 'param': 'content.image_file.file_id', 'code': 'invalid_request_id'}}

and for reference:

content = [
                                    {
                                        "type": "input_text",
                                        "text": prompt,
                                    },
                                    {
                                        "type": "input_image",
                                        "input_image": {"file_id": question_file_id,
                                                        "detail": "high"},
                                    },
                                    {
                                        "type": "input_image",
                                        "input_image": {"file_id": solution_file_id,
                                                        "detail": "high"},
                                    },
                                    {
                                        "type": "input_image",
                                        "input_image": {"file_id": student_file_id,
                                                        "detail": "high"},
                                    },
                                ]


content
Out[68]: 
[{'type': 'text',
  'text': 'The following images ...'},
 {'type': 'image_file',
  'image_file': {'file_id': 'file-***', 'detail': 'high'}},
 {'type': 'image_file',
  'image_file': {'file_id': 'file-***', 'detail': 'high'}},
 {'type': 'image_file',
  'image_file': {'file_id': 'file-***', 'detail': 'high'}}]

I have tried to change the purpose of the file upload to vision and to user_data, but that has no effect:

abc = openai_client.files.create(
    file=open(r"004.png", "rb"),
    purpose="vision"
)

or

import io

with open(r"004.png", "rb") as f:
    file_data = f.read()

file_like = io.BytesIO(file_data)
file_like.name = "004.png"  # Set the filename so that MIME type detection can occur

abc = openai_client.files.create(
    file=file_like,
    purpose="vision"
)

My last hope is to use the upload endpoint instead of the file endpoint to send the image and define the MIME type (because we cannot do that with the file endpoint apparently). And then use that file id in the assistant. I think this had been done here:
upload-image-to-assistant-via-api/801717/4

However, all this seems a bit too much workarounds for something that should be pretty simple: Upload an image and use it with o1. I don’t need a stateful endpoint necessarily.

I would be very happy if anyone has a good hint or insight!

1 Like

I found the error.

Instead of doing this:

question_file_id = abc.id

solution_file_id = abc.id

student_file_id = abc.id

I have to reference abc.id in the payload. That works at least for the responses endpoint and pdfs. I won’t try it with the assistant endpoint as it will be deprecated anyways.

You had base64 input way back there at the start.

here’s how to form a “content” part for that:

    image_part = {
        "type": "input_image",
        "image_url": f"data:image/png;base64,{base64_encoded_image}",
        "detail": "low",
        # OR
        # "file_id": "file_123456",  # File ID from uploading to OpenAI files API
    }

Yeah, the problem with b64 was that I got the error “You did not provide a third image”, which was not true. That’s why I tried a different endpoint than chat.completions and tried to upload files.

However, now I reproduced that error with responses + pdf files + uploads:

openai_client.responses.create(model=model_name,reasoning={"effort": "high"},
input=messages_payload,)
2025-03-25 14:23:16,388 [INFO] HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
Out[122]: Response(...content=[ResponseOutputText(annotations=[], text='***1*** [[You did not provide an answer to evaluate, so I had to mark it incomplete.]]', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort='high', generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=2899, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=1623, output_tokens_details=OutputTokensDetails(reasoning_tokens=1600), total_tokens=4522), user=None, store=True)


task_question = asyncio.create_task(
    async_upload_file(question_path, "user_data", openai_client, retries)
)
task_solution = asyncio.create_task(
    async_upload_file(solution_path, "user_data", openai_client, retries)
)
task_student = asyncio.create_task(
    async_upload_file(student_answer_path, "user_data", openai_client, retries)
)

# Wait for all uploads to complete
question_file, solution_file, student_file = await asyncio.gather(
    task_question, task_solution, task_student
)
2025-03-25 14:25:30,334 [INFO] HTTP Request: POST https://api.openai.com/v1/files "HTTP/1.1 200 OK"
2025-03-25 14:25:30,641 [INFO] HTTP Request: POST https://api.openai.com/v1/files "HTTP/1.1 200 OK"
2025-03-25 14:25:30,641 [INFO] HTTP Request: POST https://api.openai.com/v1/files "HTTP/1.1 200 OK"

messages_payload = [
    {
        "role": "system",
        "content": "You are an assistant.",
    },
    {
        "role": "user",
        "content": [
            {
                "type": "input_text",
                "text": prompt,
            },
            {
                "type": "input_file",
                "file_id": question_file.id,
            },
            {
                "type": "input_file",
                "file_id": solution_file.id,
            },
            {
                "type": "input_file",
                "file_id": student_file.id,
            },
        ],
    },
]

openai_client.responses.create(model=model_name,reasoning={"effort": "high"},
input=messages_payload,)
2025-03-25 14:26:10,277 [INFO] HTTP Request: POST https://api.openai.com/v1/responses "HTTP/1.1 200 OK"
Out[125]: Response(...content=[ResponseOutputText(annotations=[], text='***3*** [[Your answer clearly and accurately explains why a low-probability event can still occur, showing good understanding of statistical forecasts. Nice job!]]', type='output_text')], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, max_output_tokens=None, previous_response_id=None, reasoning=Reasoning(effort='high', generate_summary=None), status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text')), truncation='disabled', usage=ResponseUsage(input_tokens=2904, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=801, output_tokens_details=OutputTokensDetails(reasoning_tokens=768), total_tokens=3705), user=None, store=True)

Interestingly, two of my files, the solution and student file had the same name. Once I change that to different file names, I get the second result, which is correct.

However, I don’t understand yet why that is happening with b64 encoded uploads, since we don’t pass on the filenames in these cases.