Errors getting gpt4-o to analyze image and text

Hi there, I’m working on a project where I use the API through python to analyze a Base64 encoded image and instruct it on what to do. The issue is that no matter what I put in the “system” text field, the model just responds by describing the image and not actually following the directions given. Is there any definitive way to see what JSON data is actually being sent to the API, along with why it may not be following instructions? I have tried many different prompts to no success, and I can confirm the image is being sent properly as the model correctly describes it, but fails to follow instructions. Below is my message body that I’m sending to the API. Thanks!

messages=[
    {
      "role": "system",
      "content": [
        {"type": "text", "text": "Determine if there are any threats present in the given security camera image."},
      ],
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}",
          },
        },
      ],
    }
    ]

Hi @cgavaller2 ! Welcome to the community!

So in their quickstart docs, they state that the model is best at describing what is in the image, and not great at any complex reasoning. There are also lots of limitations with what it can do.

So one idea would be to simply tell it to describe the image in detail, then take the resulting textual output, feed it to GPT-4o, and give it an instruction, e.g. “determine if there are any threats present”.

Also out of curiosity - have you tried changing the role of your text instruction to "user" rather than "system" ? Just curious if the system part is actually overridden behind the scenes (they possibly override it with their safety checks). Maybe it will work better then?