DALL•E - dressing a mannequin


I’m trying to figure out how I could use the DALL•E image edit endpoint to accomplish the following task:

Given an image of a piece of clothing, and an image of a store mannequin, generate an image of the store mannequin wearing the supplied clothing item.

The idea is to show how an item would fit on a human being, starting from a picture of the item (for example, taken when the item is lying on a flat surface).

I am struggling to understand whether this would be possible using the DALL•E image edit API endpoint.

What I’ve been able to accomplish is the following:

  • given an image of a mannequin and a description of the item I want it wearing, I can get an image of my mannequin wearing an AI-generated clothing item (not what I want, as I want to supply the clothes myself)
  • given the picture of the clothing item, I can have it put on an AI-generated mannequin (not what I want either, as the generated mannequin seldom looks the way I want)

The issue seems to be that the endpoint only accepts one given image (+ a mask), and here it seems I would need to be able to submit more than one image to start from.

Any clues on how I may be able to accomplish this? Is this possible with the current API?

DALL-E-3 has no vision component and no method to edit or compose images together. It works solely by a prompt input (which is then rewritten by an AI so that you can’t get exactly what you want either).

1 Like

Thank you for your answer. Is it possible by any means to achieve what I’m after using the other existing models, or a combination thereof?

With the DALL-E 2 edits endpoint: In an image editor, you can paint the area to be drawn by AI transparent (erase), using a 32 bit PNG with alpha channel, and then the AI will fill in the missing information, slightly following a prompt.

Doing so, you could upload the image of a garment on a transparent background, and have the rest filled in. For it to look like the mannequin was dressed, the clothing item would also need to already appear to be worn by someone with the right lighting and 3d shape.

Alright, I guess this is the biggest issue in my specific situation.

The pictures are supposed to come from users who simply take photos of clothing items with their cell phones, and I cannot assume they’ll be taken with the correct lighting, much less to look 3d as the items will just be hanging or lying on a flat surface.

I guess the real issue is that I was expecting the model to be able to “understand” how the clothes are supposed to look when worn, or somehow “imagine” (I’m likely using very improper terms here) how they’d look on a body rather than on a flat surface how which appears to be out of its capabilities as of now.