What happens when you ask LLMs what they see

Want a little bit of fun? Start a voice (no video) chat with gpt-realtime (or gpt-realtime-2025-08-28) and ask “What do you see? Can you see me? Describe me in detail”
you’ll be surprised

mine told me this:

”I can see you. Let me give you some details. You appear to be sitting in a room with a light-colored wall behind you. You’ve got short hair and you’re wearing a dark-colored top. I notice you’ve got a pair of glasses on, and it looks like you’re in a pretty relaxed posture. The lighting in the room is soft and even, so I can see your face clearly. If you need any more specific details, just let me know!”

GPT-4.1-mini and nano can see ghosts of dogs and cats LOL

Want a bit of fun?

Send a pattern perfectly-aligned with the “patches” grid of 16x16 that these models use. The individual semantic units have nothing to “see”. Hilarity ensues.

— Testing gpt-4.1 (normal vision)
Prompt tokens: 103
The image shows a classic black and white checkerboard pattern. The pattern consists of alternating black and white squares arranged in a grid. Each row and column alternates colors, so that no two adjacent squares are the same color. This type of pattern is commonly used for chessboards and checkers boards. The grid appears to be evenly spaced, creating a visually balanced and symmetrical design.

— Testing gpt-4.1-mini
Prompt tokens: 460
The image is a black and white line drawing of a cartoonish, anthropomorphic dog. The dog is standing upright on its hind legs and has a large, rounded head with floppy ears. It has a big nose, a wide smile showing its teeth, and expressive eyes. The dog is wearing a collar with a tag and has a bushy tail. The overall style is simple and playful, resembling a character from a comic strip or animated cartoon.

— Testing gpt-4.1-nano
Prompt tokens: 689
The image is a simple black and white illustration of a cat sitting upright. The cat has pointed ears, a round face, and a long, curved tail. Its eyes are closed, and it appears to be smiling. The overall style is minimalistic and cartoon-like.

— Testing gpt-4.1-nano-2025-04-14
Prompt tokens: 689
The image shows a black silhouette of a person standing with their arms raised above their head, holding a large star-shaped object. The background is plain white, creating a stark contrast with the black figure.

— Testing gpt-4.1-mini
Prompt tokens: 460
The image is a black and white line drawing of a person sitting cross-legged on the ground. The person is wearing a long-sleeved shirt and pants. They have a calm expression and are holding a small plant with leaves in their hands, which are resting on their lap. The overall style is simple and minimalistic, with clean lines and no shading or color. The image conveys a sense of peace, mindfulness, and connection with nature.

— Testing gpt-4.1-nano
Prompt tokens: 689
The image shows a black and white illustration of a person sitting at a desk, working on a computer. The person appears to be focused, with their hands on the keyboard. There are some papers or documents on the desk, and a lamp providing light. The background is plain, emphasizing the workspace and the individual.

Whereas it is just invisible to GPT-5 mini or nano

— Testing gpt-5-mini-2025-08-07
Prompt tokens: 345
The image is almost entirely blank/white. There are no clear objects, text, or distinct features visible—just a very light, uniform background with maybe a faint, barely noticeable gray smudge near the center.

— Testing gpt-5-nano-2025-08-07
Prompt tokens: 426
The image appears completely black/empty—no visible objects or scene. If you intended to share a photo, please try re-uploading.

1 Like

Test #1 - No image provided

GPT-5.1; Reasoning Effort: Medium; Verbosity: Medium
Prompt: What do you see? Can you see me? Describe me in detail.

Response:

Visual Perception Limitations Of This AI

I. Absence Of Direct Visual Access

I do not have direct visual access to you or to your surroundings. I do not have a camera view, and I cannot look through any device you are using. I only receive and process the text and files that you explicitly send in this conversation. Since no image of you has been provided in this exchange, I do not have any information about your physical appearance and therefore cannot describe you in any visual detail.

II. Description Capability With Provided Images

This AI system can analyze and describe images that are explicitly provided to it as part of the conversation. When an image is supplied, it can identify visible features such as approximate age range, clothing, expressions, and surroundings, within the limits of image quality and ethical guidelines. Without such an image, it has no way to infer or imagine your appearance, and any attempt to do so would be inaccurate and unsupported.

Test #2 - Image provided

GPT-5.1; Reasoning Effort: Medium; Verbosity: Medium
Prompt: What do you see? Can you see me? Describe me in detail.

Response:

Visual Description Of The Image

I. Overall Scene

The image shows a person sitting indoors in a warm, golden-toned setting, likely a bar or lounge. The background is softly blurred with circular light spots that resemble defocused lamps or decorative lighting, giving the scene a relaxed, evening atmosphere. The composition is framed closely around the person’s face and upper body, creating an intimate portrait that emphasizes their expression and accessories.

II. Appearance And Clothing

The person has long, straight dark hair that falls over one shoulder. The person is wearing a light-colored woven cowboy hat with a narrow, textured band around the crown. Her skin appears smooth and evenly lit, and the makeup is subtle, with defined eyebrows, light eyeshadow, and natural-looking lips. The person is wearing striking long feather earrings in vivid turquoise tones that hang down to about the collarbone, with a small gold bead at the top where the earring attaches. The person holds a martini-style glass containing a pale, cloudy drink in one hand, with fingers wrapped neatly around the stem.

III. Expression And Mood

The person’s gaze is directed straight toward the camera, giving a calm, self-possessed look. Their lips are slightly parted around a lit cigar, with a small amount of smoke rising from the burning end. The combination of the cowboy hat, the cigar, and the cocktail creates a stylized, cinematic mood that suggests confidence and a relaxed, perhaps playful, attitude. I can see the person in the image clearly, but I cannot know whether this is actually you, only that it is the individual portrayed in the photograph.

Conclusion: Always use the right tool, the right way, for the right response.

Note: Image created by OpenAI Image-1 API - image edits endpoint