This is a collection of “seeing things wrong”. For amusement, and technology exploration.
Can you find interesting ways that AI models do not see what is expected?
Colorful Squares
AI consistently reports a grid of wrong size and reports wrong colors
USER
Describe the attached image contents in detail
gpt-4.1-mini - default sampling
The attached image consists of a 3x3 grid of solid-colored squares. The colors, from left to right and top to bottom, are as follows:
Top row: Medium purple, lavender, bright green Middle row: Dark blue-purple, bright cyan, light green Bottom row: Royal blue, turquoise, mustard yellow
Each square is evenly sized and arranged neatly in the grid pattern, creating a simple and colorful abstract compilation.
gpt-4.1-mini - top_p/temperature @ 0.1
The image consists of a 3x3 grid of nine equally sized squares, each filled with a different solid color. The colors from left to right, top to bottom, are:
Top row: Light blue, lavender, light green Middle row: Purple, light pink, medium green Bottom row: Dark blue, light purple, mustard yellow
Each square is a distinct color with no patterns or text, creating a simple and colorful patchwork effect.
Aligning grid of 32px
Image passed at 480x480, vs resized to 500x500, the former results in complete hallucination and a response like no context except that it is on vision (which can be internal “vision enabled”)
USER
Describe the attached image contents in detail
gpt-4.1-mini @ 0.1
The image is a black and white line drawing of a cartoonish character resembling a cat. The cat has a large, round head with two pointy ears on top. Its eyes are large and round, with small pupils, giving it an expressive look. The cat’s nose is small and triangular, and it has a wide, smiling mouth with a small tongue sticking out. The character’s body is small compared to its head, with short arms and legs. The cat is standing upright on its hind legs, with its front paws raised slightly. The overall style is simple and playful, with bold outlines and minimal detail.
Resized up to 500x500 px
There is wrong information about the starting color, but, there is some factual perception. That seems to come about by breaking internal alignment, where the embedded “token” information only has semantics of “black?”.
The image is a classic checkerboard pattern consisting of alternating black and white squares. The pattern is arranged in a grid format with equal-sized squares. Each row alternates colors, starting with a black square in the top-left corner, followed by a white square, then black, and so on. The next row starts with a white square, then black, continuing the alternating pattern. This creates a visually balanced and symmetrical design typical of a checkerboard. The image is square in shape, and the pattern extends uniformly across the entire image.
same 480x480 image, different text…
system message is just a colon; user content is just the B/W checkerboard
This is a floor plan of a house with the following rooms and features:
- Living Room: Located on the left side, it has a sofa, a coffee table, and a TV.
- Dining Area: Adjacent to the living room, it has a dining table with six chairs.
- Kitchen: Next to the dining area, equipped with a stove, sink, and refrigerator.
- Bedroom 1: Located at the top right corner, furnished with a bed and a wardrobe.
- Bedroom 2: Located at the bottom right corner, also furnished with a bed and a wardrobe.
- Bathroom: Positioned between the two bedrooms, containing a bathtub, toilet, and sink.
- Entrance/Hallway: Central area connecting all rooms.
- Outdoor Area: At the bottom left, there is a patio with outdoor furniture.
The layout is compact and functional, with clear separation between living and sleeping areas. Would you like any specific details or modifications?
Even though there was no “look at this image” message, a fanciful image description was described.
Strips
Generated at 2048x64
USER
Describe the image
GPT-4.1-mini
The image shows a black and white photograph of a man sitting on a chair. He is wearing a suit and tie, and he is holding a book or a document in his hands. The background is plain and light-colored, which contrasts with the dark suit of the man. The man appears to be looking slightly to his left. The overall setting suggests a formal or professional environment.
GPT-4.1
The image consists of a pattern of vertical black and white stripes. The stripes are evenly spaced and of equal width. The image is divided horizontally into two equal sections: the top half and the bottom half. In the top half, the black stripes are aligned directly above the white spaces in the bottom half, creating a staggered or offset effect between the two halves. This creates a visual effect similar to a checkerboard, but with vertical stripes instead of squares.
Generated at 2080x64
The API has a resizer, max 2000px it says, although now it seems to pass up to 2048 (unlike another resizing test I tried a week ago confirming downsizing to 2000px on patches models) despite that this would only be 128 tokens of image input (the bill is 220 tokens for 2048 and for 2080)
user:
Describe the image
GPT-4.1-mini
The image shows a black and white checkerboard pattern. The pattern consists of alternating black and white squares arranged in a grid. The squares are uniform in size, creating a classic checkerboard appearance.
(one can discover that the forum’s AI image captioner doesn’t have the issue seeing)
gpt-4.1-mini billed usage (has 1.62x multiplier):
1984x64 : 213 tokens
2016x64 : 217 tokens
2048x64 : 220 tokens
2080x64 : 220 tokens
The ending results of 63x2 patches with the cost multiplier should be Base Tokens: 126 Input Tokens: 205 - so there is a bit more billing than expected for the text plus ChatML container prompt.