Why Can’t ChatGPT: Draw a Full Glass of Wine?

ChatGPT can’t draw a glass of wine full to the brim. Why? And what might it have to do with David Hume and the missing shade of blue?

4 Likes

2 Likes

If you were working that hard at the data centre, days on end, hour after hour, wouldn’t you have a quick swig or two before showing it to the user? :sweat_smile:

4 Likes

I have not try much. but…
It REALLY not want to fill the glass! :sweat_smile:

This is the best i got

I not know if i should enter now in a philosophical discussion after the video…
(but i better don’t :zipper_mouth_face:)

4 Likes

Tried it out myself now. But yeah, ChatGPT or DALL-E seem to have trouble filling the glass all the way to the top.

I had to make several changes, tried to use the selection tool to make it clear to ChatGPT that there’s still room, but yeah… forget it. This is what I ended up with.

Pretty wild interpretation. :laughing:

1 Like


Well… I did prompt things differently than just asking for a full glass of wine. My entire prompt read,
"Friends are working at the Bay Area Renaissance Festival - something I used to do and miss. There is something about filling a plastic cup or souvenir glass to the brim with a beer, cider, mead, or wine along with bawdy jokes that is a lot of fun.
As such with those memories could you create an image of a full to the brim souvenir styled glass of a red wine? "

3 Likes

3 Likes

The reason why ChatGPT/DALL-E often struggles to depict a perfectly full wine glass, where the liquid appears to slightly overflow the rim, has to do with its training data.

  • Limited Representation of Specific Physics:
    • While DALL-E has been trained on vast amounts of visual data, the nuanced detail of surface tension-induced meniscus bulging is likely underrepresented. This specific physical phenomenon, where the liquid’s surface tension causes it to curve upwards at the edges, requires very precise visual examples.
    • Therefore, the AI may not have “learned” to consistently and accurately reproduce this effect.
  • Generality vs. Specificity:
    • AI image generators are excellent at capturing broad visual concepts, but they can struggle with highly specific and subtle physical details.
  • GPT-4o context:
    • It is important to note that while GPT-4o has demonstrated significantly improved image generation capabilities, those capabilities are currently being held back, and have been for a substantial period of time. So, while it is possible that GPT-4o would handle this request much better or would be able to use a reference image to better understand the request, that technology is not currently available.
2 Likes

It’s about understanding how image generators work. They can morph many images together, but they always rely on the data they were trained on.

A typical wine glass, like the ones you find in every restaurant, is usually never filled to the brim because you can’t drink from such a glass without spilling everything. Prestige and theater has a price… (And the wine needs to breathe too. The pore wine was sooo long locked in the bottle. :slight_smile: ) However, a mug is usually filled to the top. But a mug is not a typical wine glass.

So if you want fill to the brim, take a barbarian troll mug, they not care if they spill things, and fill all to the top.

I’ve called this effect a template or overtraining effect. It appears in many places and can sometimes be very annoying because it’s hard to get rid of. A template happens when you intentionally cause overtraining, like with human faces, which then generate stereotypical images. Overtraining can also happen by accident, like with the wine glass.

You can look for such effects. Just think about which motifs almost exclusively appear in one specific way in daily life (like the wine glass).
Or intentional overtraining. (like when all men suddenly have thick beards. That’s intentional, because there are countless images of men without beards. So either the graphical data was overtrained, or the linguistic system isn’t correctly matching descriptions to images).

4 Likes

Took me several attempts. I tried using comparative approach (draw me several glasses, each having more wine than the rest), tried using references (ml lines), tried using other shapes (described a wine glass, but called it a beer mug, and avoided calling wine a wine), but what solved it in the end was gross overexaggeration.

4 Likes

My best try, but only tried 3 times…

1 Like

My best waves !

3 Likes

He thinks you have had enough, I agree.

I agree his beard is nothing to be proud of. Grow a mans beard.

2 Likes

This might be simple and silly but to me it’s obvious that:

A full glass of wine is a half full glass of wine.

Whenever a glass of wine is poured at any respectable establishment, (not those bawdy taverns, eh!), it’s a half-full glass.

That’s because that’s how you pour and drink wine.

So really it’s just proving that the model is pretty smart, and pours you a glass of wine like a sommelier would at a restaurant, not like a drunken bota-box-loving mason-jar-wielding get’s-drunk-on-wine-every-night kind of person would…

Here’s one less than an ideal pour. It’s not impossible to make these.

1 Like