Vision API - "Image not allowed by our safety system"

Hi all! I’m experiencing trouble with multiple images, here’s one example - an image of a sign on the wall. However, the API responds with “Your input image may contain content that is not allowed by our safety system”. This isn’t the only image that, despite appearing normal, gets flagged for some reason. Can anyone explain why this is happening and if there’s a fix? Alongside this, I’m asking ChatGPT to describe what’s in the image. Thanks!

My guess is that this is part of their captcha defence. If it allows for detecting numbers, one could use it to defeat captchas. Annoying when you’re just trying to use it for legit purposes, though.

1 Like

That would be kind of silly because there are a lot of images with numbers out there, haha. I hope this is just a temporary issue. By the way, here’s another image that the API is having trouble describing:

1 Like

OpenAI has disabled most OCR type functionality, in my opinion. They give various reasons, at various times, as the reason for doing so. Any attempts to use OCR for applications may work sometimes, and fail sometimes. Definitely not reliable enough to use for an application.

I wouldn’t be so critical, but there are indeed some restrictions.

Over the last 7 days, I’ve requested descriptions for about 10,000 images through the API, and it’s generally been working well.
Though a small percentage of images are getting declined.

I’m trying to understand the issue, as some of these images are important too. Interestingly, most of the images included texts too, which the AI was able to read without any problems. The rejections seem very random.

One thing which can help is to modify the image slightly to make it look less like a CAPTCHA.

I discovered this as a side-effect of using “set-of-marks” prompting with the vision model.

overlay_grid.py
import argparse
from PIL import Image, ImageDraw, ImageFont
import os

def draw_labeled_grid(image_path, horizontal, vertical, font_size):
    print(image_path)
    with Image.open(image_path) as img:
        draw = ImageDraw.Draw(img)
        width, height = img.size
        cell_width = width / horizontal
        cell_height = height / vertical

        font_path = '/usr/share/fonts/truetype/noto/NotoMono-Regular.ttf'
        try:
            font = ImageFont.truetype(font_path, font_size)
        except IOError:
            font = ImageFont.load_default()

        text_padding = font_size / 8  
        line_color = 'blue'
        line_width = 4
        for h in range(horizontal + 1):
            draw.line((h * cell_width, 0, h * cell_width, height), fill = line_color, width = line_width)
        for v in range(vertical + 1):
            draw.line((0, v * cell_height, width, v * cell_height), fill = line_color, width = line_width)
        
        num_digits = len(str(horizontal * vertical))
        for h in range(horizontal):
            for v in range(vertical):
                label = str(h * vertical + v + 1).zfill(num_digits)
                text_x = h * cell_width + text_padding
                text_y = v * cell_height + text_padding

                draw.text((text_x, text_y), label, fill = line_color, font = font)

        output_path = f"labeled_grid_{os.path.basename(image_path)}"
        print(output_path)
        img.save(output_path)
        return output_path

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Draw a labeled grid on an image.')
    parser.add_argument('image_path', type=str, help='The path to the image file.')
    parser.add_argument('horizontal', type=int, nargs='?', default=3, help='The number of horizontal cells.')
    parser.add_argument('vertical', type=int, nargs='?', default=3, help='The number of vertical cells.')
    parser.add_argument('font_size', type=int, nargs='?', default=48, help='The font size for the labels.')

    args = parser.parse_args()

    draw_labeled_grid(args.image_path, args.horizontal, args.vertical, args.font_size)
1 Like

This is really interesting, I’ll dig deeper, thanks!

We’ve developed a WordPress plugin that automates the writing of attachment metadata, so we’ll see if we can reduce the number of skipped images with a similar approach.

That wasn’t being critical, but simply accurate, to help members of this community not waste time on something that will fail in production.

Mostly it’s “business related” information that OpenAI will refuse to OCR, like people’s names, addresses, emails, phone numbers, company names, etc. So as long as your use case doesn’t involve business info you’ll be fine, …unless/until OpenAI changes their mind and censors your use case as well.

1 Like

This is very true, yes.
We created a WordPress plugin that writes image titles, ALTs, etc., so it really uses an API for the very basic feature it should do - see what’s in the image and understand it.