Hi there,
Chat Completions
I initially used the chat completions endpoint to send three b64 encoded images alongside a prompt to o1. That worked fine for 93 - 97% of the cases, but I got a few requests where o1 responded with “Your image was empty”.
# Prepare the messages payload
messages_payload = [
{
"role": "system",
"content": "You are interpreting the images.",
},
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{img_question_b64}"},
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{img_solution_b64}"},
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{img_b64}"},
},
],
},
]
# Call the OpenAI API
response = await asyncio.to_thread(
openai_client.chat.completions.create,
model=model_name,
messages=messages_payload,
I have tried
- to make the requests synchronously
- reduce the size (both height/widht and file size) of the image
- several runs
- decoding the b64 image to see whether it was encoded correctly
The error reappeared over and over. It does not appear to happen with the largest images and I also not hitting any rate limits. I do have, 2-3 same cases that repeatedly result in a response of the type “The images are empty”, plus sometimes some more.
What I haven’t tried yet is to remove exif data, as suggested in this post:
image-upload-for-analysis-fails-randomly-help/889007
Responses
I then moved to the responses endpoint, and tried uploading the images separately and use file ids. Which does not seem to work, as files for the responses endpoint need to be pdfs or at least cannot be images. One solution could be to convert my images to pdfs to meet that requirement. I am a bit surprised though that the responses endpoint which is supposed to replace assistant lags behind the capabilites of assistants, which can accept images as file ids.
Assistants
I am hence now trying the assistants endpoint:
abc.filename
Out[57]: '004.png'
abc.id
Out[58]: 'file-***'
question_file_id = abc.id
solution_file_id = abc.id
student_file_id = abc.id
thread_message = openai_client.beta.threads.messages.create(
empty_thread.id,
role="user",
content=content,
)
BadRequestError: Error code: 400 - {'error': {'message': 'Invalid message content: Expected file type to be a supported format: .jpeg, .jpg, .png, .gif, .webp but got none.', 'type': 'invalid_request_error', 'param': 'content.image_file.file_id', 'code': 'invalid_request_id'}}
and for reference:
content = [
{
"type": "input_text",
"text": prompt,
},
{
"type": "input_image",
"input_image": {"file_id": question_file_id,
"detail": "high"},
},
{
"type": "input_image",
"input_image": {"file_id": solution_file_id,
"detail": "high"},
},
{
"type": "input_image",
"input_image": {"file_id": student_file_id,
"detail": "high"},
},
]
content
Out[68]:
[{'type': 'text',
'text': 'The following images ...'},
{'type': 'image_file',
'image_file': {'file_id': 'file-***', 'detail': 'high'}},
{'type': 'image_file',
'image_file': {'file_id': 'file-***', 'detail': 'high'}},
{'type': 'image_file',
'image_file': {'file_id': 'file-***', 'detail': 'high'}}]
I have tried to change the purpose of the file upload to vision and to user_data, but that has no effect:
abc = openai_client.files.create(
file=open(r"004.png", "rb"),
purpose="vision"
)
or
import io
with open(r"004.png", "rb") as f:
file_data = f.read()
file_like = io.BytesIO(file_data)
file_like.name = "004.png" # Set the filename so that MIME type detection can occur
abc = openai_client.files.create(
file=file_like,
purpose="vision"
)
My last hope is to use the upload endpoint instead of the file endpoint to send the image and define the MIME type (because we cannot do that with the file endpoint apparently). And then use that file id in the assistant. I think this had been done here:
upload-image-to-assistant-via-api/801717/4
However, all this seems a bit too much workarounds for something that should be pretty simple: Upload an image and use it with o1. I don’t need a stateful endpoint necessarily.
I would be very happy if anyone has a good hint or insight!