Has anyone used Dall E, etc. to redact an image (block out text)? It seems like the end user ChatGPT will do this or some semblance of it but the same prompt to the edits endpoint to Dall E just returns the same images. For my use the text may not be in the same place or I’d just use a stock image processor for this.
Do you have an example of what you mean? Redacted usually means placing a black bar across something.
DALLE3 doesn’t have edit endpoint yet either.
You’d need to probably use Vision first too to find / locate any text on the image.
Blocking out text is what I’m talking about (think automatically covering car tags in an image with a dozen cars). Dall E 2 has an /edits endpoint that sounds like it should work but so far I haven’t seen it do anything to the returned image.
The end user version will do it in a limited manner though it doesn’t seem to be calling Dall E if its answers are to be believed.
Right, but for edits endpoint to work, you have to mask the image in those spots, so you would need to either mark them manually or maybe use GPT-4-Vision to try to spot them? Sounds interesting, but I’m not sure the tech is there yet?
ETA: You might not even need edits endpoint if you’re just redacting it with a black bar… you’d still need to find what needs to be redacted, though, then just use those coordinates to use ImageMagick or something to mark them…
Vision says it can’t help with spotting text like that. The exact same image I can get it to describe in great detail but giving me coordinates it isn’t able to do.
DALL-E 3 does not accept images as input.
DALL-E 2 on the edits endpoint allows AI infill of an area made transparent, only on images of supported size 1024x1024, 512x512, or 256x256.
GPT-4-Vision AI does not support grounding, which would allow returning the location of detected objects. You could use a large local or hosted model such as minigpt-v2 or Azure vision products capable of bounding boxes. Training on a particular subject (like license plates) would take particular tuning.