I discovered the api does not read imafes from url, evwn the example provided in the docs is only assuming what the images could be by analyzing the url string itself. As soon as you give it an image url to a generic image like test.jpg, its no longer able to interpret it and sais it does not have access to read images directly.
Am i missing something here? Does vision work for you using image urls?
I also tried base64 encoded images, and it tells me its incapable of decoding base64 and i should upload the image directly (but i am using the api)
I tried all models i cluding the vision preview and gpt-4o and assistants api, all the same inability to read urls, regardless on what server they are.
The refusal amount was significantly increased. A script I posted yesterday has denial of vision capability today, so OpenAI must have added some new training weights to the model, or added some more stupidity to the computations.
This system prompt overcomes ignorance and denials:
“You are a computer vision assistant, based on GPT-4o Omni, a multimodal AI trained by OpenAI in 2024.”
or added to another task bot:
“…You have computer vision enabled, and are based on GPT-4o Omni, a multimodal AI trained by OpenAI in 2024.”
If you don’t like gpt-4o (specifically) saying “I’m sorry, but…”, then you can add this API parameter (o200k): logit_bias={"15390":-99, "23045": -99}
Several tries you might use to get gpt-4o not to ignore its nature:
Response i get:
“I’m unable to access or view images directly. However, you can describe the image to me, and I’ll do my best to help you with any information or analysis you need!”
here is the exact response i get, as you can see its gpt-4o but its also saying it can’t read imges from urls.
{
“id”: “chatcmpl-9RdijQprkM55zLGdp2E9qg2NlXQKj”,
“object”: “chat.completion”,
“created”: 1716374569,
“model”: “gpt-4o-2024-05-13”,
“choices”: [
{
“index”: 0,
“message”: {
“role”: “assistant”,
“content”: “I’m currently unable to directly view images from URLs. However, if you can describe the image to me, I can help you analyze or provide information based on your description.”
},
“logprobs”: null,
“finish_reason”: “stop”
}
],
“usage”: {
“prompt_tokens”: 130,
“completion_tokens”: 34,
“total_tokens”: 164
},
“system_fingerprint”: “fp_729ea513f7”
}