CLIP GUI - XAI app ~ explainable (and guessable) AI with CLIP ViT & ResNet models

Reese · March 24, 2024, 2:41pm

Uses CLIP gradient ascent to optimize text embeddings for cosine similarity with the input image embeddings → CLIP “opinion” words / text output
GradCAM / attention visualization of salient features for both CLIP ViT and ResNet models
Gamified: You can guess along with what CLIP was “looking at” by placing a ROI for a given word
You can edit CLIP’s opinion and add your own opinion (force your human bias zero-shot choice on the AI ;-))
Try using ViT-L/14 - the Stable Diffusion, SDXL, etc. text encoder, or - slightly less ideal outcome - even ViT-B/32, and prompt a text-to-image generative AI with the CLIP opinion. “A CLIP knows what a CLIP sees” (if the models match or are very similar).
Could make this a fun network-enabled smartphone app “voting system” for “who can guess correctly what the AI was ‘looking’ at?” + high scores table (gamification) for secondary school - IF ONLY! If only CLIP’s “opinion” didn’t include such inappropriate descriptions of salient features, and in such unpredictable (unexpected) ways. Alas this is more of a “heads up” than an actual implementation suggestion.

Alas, “enjoy responsibly”!

Topic		Replies	Views
Geometric Parametrization fine-tune of ViT-L/14 on CoCo 40k (RTX4090, 20 Epochs, batch_size=40) outperforms OpenAI/ ViT-L/14 accuracy on ImageNet and (partially) mitigates typographic attack vulnerability Community fine-tuning , python , clip	1	362	May 31, 2024
Introducing apps.simiacrypt.us: A Generative AI Gallery Community chatgpt , api , dalle3	0	708	January 21, 2024
Interesting Research: PIGEON, an AI-based location identifier Community ai , gpt , clip	5	21968	July 4, 2024
VisualGenAI: One AI Mac app for all your GenAI needs Community chatgpt , api	3	2132	December 31, 2023
Image Upload / Recognition via Bing Chat Community api , image-reading	8	3158	August 27, 2023