We are having the same issue, gpt-4o-2024-08-06 model works with image content types while our fine tuned model doesn’t work through the API, we can however get it worked with images on the dashboard.
status code 400 Invalid model: ft:gpt-4o-2024-08-06:xxx:yyyy:xxxx does not support image message content types.
I have the same issue fine tuned model does not get image input via Assistants API Although it get images via dashboard (playground)
Is this a bug to be fixed or this will be a new capability in the Assistants APı that we need to wait to be released
Contacted OpenAI support and they confirmed this is the case and also acknowledged it is not documented. Disappointing to say the least.
Their response:
Currently, fine-tuned instances of GPT-4o do not support multi-modal processing, meaning they are limited to text-only inputs and outputs. While the base GPT-4o model supports multi-modal capabilities (text and image inputs), fine-tuning a model results in the loss of this functionality. This limitation is not explicitly mentioned in the documentation, and we understand how this could cause confusion. At this time, there is no way to retain multi-modal processing after fine-tuning. If multi-modal capabilities are critical for your use case, you may need to use the base GPT-4o model without fine-tuning or explore alternative approaches to achieve your goals. We appreciate your feedback and will pass it along to the team to improve the clarity of our documentation. If you have further questions or need assistance, feel free to let us know.
The Responses API Playground reveals another fault with vision after several were fixed: An error occurred while processing your request. on the same fine tuning model.
More: I got the Responses API working correctly on the fine tuning model locally, sending the input correctly, which I guess is another bug to report about the Playground.
If you want a server side chat state plus fine-tuning plus vision attempted, at least you have that option, if not Assistants (now six months later).
Here’s my Responses API input (no SDK), plus added assistant describing image:
[
{
"type": "message",
"role": "developer",
"content": [
{
"type": "input_text",
"text": "You are ChatGPT, with built-in image computer vision for user attachments."
}
]
},
{
"type": "message",
"role": "user",
"content": [
{
"type": "input_text",
"text": "Describe attached image"
},
{
"type": "input_image",
"detail": "low",
"image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAw..."
}
]
},
{
"role": "assistant",
"content": "The image shows a post from a discussion or forum. The title of the post is \"Gpt-4-vision timing out no reply,\" and it is under a category labeled \"API - gpt-4-vision.\" The user, identified by \"ypf(redact),\" mentions experiencing issues with the image recognition function. They state that it did not take effect and did not return results for a long time, resulting in a request timeout, with no text message returned. There is also an option to reply to the post."
}
]