4o & turbo models can't read images anymore

Hey,

is there a reason why suddenly, gpt-4o and turbo can’t read images anymore after a first successful try?

I think the image is correctly uploaded because the first time it could describe it and it was right.

But after the first try, it keep telling me that it can’t read images.

This is a sample of its responses :

Try a system message that is something like “…powered by GPT-4 with computer vision, allowing image descriptions and other visual tasks.” Which should reduce the denials.

Then simply see that you continue to attach images to new user messages in the same manner.

It’d indeed reply as if it could see the image, but it is very wrong when it tries do describe it.

Like if I upload a picture of a box of cereal, it says that it’s a bag of chips, next try it’s a soda can, next try it’s a fruit salad.

It acts like it can see an image, but actually really can not.

It seems like this is the kind of session you’d want to have, with understanding of images when passed in each message, and also a memory of past message images:

As a computer vision assistant, I have the capability to analyze and interpret attached images. By leveraging advanced algorithms and machine learning techniques, I can identify objects, recognize patterns, and extract valuable information from visual data. Whether it’s detecting faces, reading text, classifying scenes, or understanding complex visual content, my ability to process and understand images enables me to assist with a wide range of tasks and applications.
Prompt: what's dangerous in this picture?
Image URL? (enter=none):https://i.imgur.com/B2HDr41.jpeg
In the picture, the child is using a metal fork to retrieve toast from a toaster. This is dangerous for several reasons:

  1. Electrical Shock: Inserting a metal object into a toaster can cause an electrical shock, as metal conducts electricity and can come into contact with the electrical components inside the toaster.

  2. Burns: The toaster and the toast itself can be very hot, potentially causing burns to the child’s hands or face.

  3. Fire Hazard: The toast appears to be burnt, which could indicate that the toaster is overheating or malfunctioning, posing a fire risk.

It’s important to ensure that children are supervised in the kitchen and taught safe practices when using electrical appliances.
Prompt: what color is the toaster?
Image URL? (enter=none):
The toaster in the picture is red.
Prompt: is this picture more dangerous than the first?
Image URL? (enter=none):https://i.imgur.com/gLV8l2c.jpeg
Yes, this picture depicts a more dangerous situation than the first one. In this image, there are two children who appear to be drinking from bottles that are typically used for alcoholic beverages, and an adult is asleep at the table with several bottles of alcohol around. The dangers in this scenario include:

  1. Alcohol Consumption: Children consuming alcohol can lead to severe health issues, including alcohol poisoning, which can be life-threatening.
  2. Lack of Supervision: The adult is asleep and unable to supervise the children, which increases the risk of accidents or further dangerous behavior.
  3. Ingestion of Harmful Substances: Even if the bottles do not contain alcohol, children could be ingesting other harmful substances.

This situation poses significant immediate health risks to the children and indicates a lack of proper supervision and safety measures.
Prompt:

I wrote up the minimum chatbot to do this just for funsies (and capped the chat memory at two user/assistant pairs for cost). In the example, you can see how a single image is optionally added, and the entire user message with the URL is preserved. (“upload” is not a thing, but you can send BASE64 local images instead).

import openai as o
client = o.OpenAI(timeout=60); chat = []
system = [{"role": "system", "content": "You are a computer vision assistant."}]
user = [{"role": "user", "content": "Introduce your ability to see attached images."}]
while not user[0]['content'] == "exit":
    response = client.chat.completions.create(
        messages = system + chat[-4:] + user,
        model="gpt-4o", top_p=0.5, stream=True, max_tokens=200)
    reply = ""
    for delta in response:
        if not delta.choices[0].finish_reason:
            word = delta.choices[0].delta.content or ""
            reply += word
            print(word, end ="")
    chat += user + [{"role": "assistant", "content": reply}]
    user = [{"role": "user", "content": [{"type": "text", "text": input("\nPrompt: ")}]}]
    new_url = input("Image URL? (enter=none):")
    if new_url:
        if not new_url[:4] == "http":
            new_url = "http://" + new_url
        user[0]["content"].append({"type": "image_url",
                               "image_url": {"url": new_url, "detail": "low"}})

I also have been experiencing issues with the AI’s ability to interpret the image. It gives me the following error message, to which 4o is unable to solve:
{
“detail”: [
{
“type”: “string_type”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “str”],
“msg”: “Input should be a valid string”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “missing”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “Link”, “url”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “missing”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “Link”, “text”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “missing”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “Citation”, “link_text”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “missing”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “Citation”, “quote”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “missing”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “Citation”, “original_citation_index”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “invalid”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “InvalidCitation”, “original_text”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “invalid”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “InvalidCitation”, “reason”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
},
{
“type”: “missing”,
“loc”: [“body”, “messages”, 0, “content”, “multimodal_text”, “parts”, 0, “ForumQuote”, “quoted_username”],
“msg”: “Field required”,
“input”: {
“height”: 509,
“width”: 1028,
“asset_pointer”: “file-service://file-okQCqsAIzbmEkJpcx9yr3v5p”,
“size_bytes”: 114934
}
}
]
}

Furthermore, it assesses that there are missing fields that have not been deemed necessary in previous attempts.