My image completions using GLIDE and my text to image completions using minDALL-E

Note: Click the links and then download any image below if you want to see it’s full resolution, most are close however any image that has ex. 16 image stuck together left-to-right, those are just badly shrunken, do download those ones if interested in them.

Note: small GLIDE, made by openAI, below, is impressive even when being the smaller version which was only trained on 67-147 million text-image pairs or so, not 250M like the real GLIDE, and is 10x less parameters (300 million). Keep in mind I let it often take half an image as input and extend it by the other half, 128 pixels, but if it used all but the last row of pixels to predict the next row, it’d have been even more accurate !

Using text prompts and choosing which completion I liked, I made this by stitching them together (it only could be fed a square image, but still came good!):

original image:

No Text Prompt — extended all around:

original image

Text & Image Prompts — elongated (scroll down page)

“tiger and tree” + download-8 hosted at ImgBB — ImgBB
= download-9 hosted at ImgBB — ImgBB

“tigers and forest” + download-12 hosted at ImgBB — ImgBB
= download-13 hosted at ImgBB — ImgBB

“tigers in river and forest” + above
= download-16 hosted at ImgBB — ImgBB

“circuit board” + download-2 hosted at ImgBB — ImgBB
= download-1 hosted at ImgBB — ImgBB

“wildlife giraffe” + cool hosted at ImgBB — ImgBB
= download-5 hosted at ImgBB — ImgBB

“bathroom” + test hosted at ImgBB — ImgBB
= download-1 hosted at ImgBB — ImgBB

“laboratory machine” + download-8 hosted at ImgBB — ImgBB
= d78ownload-5 hosted at ImgBB — ImgBB

“pikachu” + image
= download-6 hosted at ImgBB — ImgBB

“humanoid robot body android” + robot hosted at ImgBB — ImgBB
= download-11 hosted at ImgBB — ImgBB

“bedroom” + download-16 hosted at ImgBB — ImgBB
= download-14 hosted at ImgBB — ImgBB

“sci fi alien laboratory” + download-1 hosted at ImgBB — ImgBB
= download hosted at ImgBB — ImgBB

“factory pipes lava” + download hosted at ImgBB — ImgBB
= download-1 hosted at ImgBB — ImgBB

“factory pipes lava” + download-2 hosted at ImgBB — ImgBB
= download-3 hosted at ImgBB — ImgBB

“toy store aisle” + download hosted at ImgBB — ImgBB
= download-1 hosted at ImgBB — ImgBB

“fancy complex detailed royal wall gold gold gold gold”

“gold gates on clouds shining laboratory”

“gold dragons”

“gold bricks lined up in a room”

“gold dragon statue with wings and breathing fire”

GLIDE also works with no text prompt, it does fine, just ~2x worse maybe:
–no text prompts–

You can compare one of these to NUWA’s:

To use GLIDE, search Google for github glide openai. I use it in kaggle, as its faster than colab for sure. You must make an account then verify number then open this in colab and only then can you see on right side the settings panel and in there u need to turn on GPU and internet. Upload images to right side top Upload, and then in the image calling part of the code that says ex. grass.png you put there simply ex. see i have:

Source image we are inpainting

source_image_256 = read_image(‘…/input/123456/tiger2.png’, size=256)
source_image_64 = read_image(‘…/input/123456/tiger2.png’, size=64)

To control the mask change the 40: thingy to ex. 30 or 44. To control the mask sideways, add another one ex. [:0, :0, :30, :30] or something like that if I got it wrong, you just add one to the end i mean haha. Apparently you can add more than 1 mask (grey box) by doing ex:
mask[…]
mask[…]
mask[…]

Batch size sets the number of images to generate.

Once it is done, click console to get the image and right click it to save it.

Here’s mine for minDALL_E (this one had no image prompt allowed. So, just text.)

minDALL-E was only trained on 14 million text-image pairs. OpenAI’s was trained on 250M. And the model is only 1.5 billion parameters, ~10x smaller.

“a white robot standing on a red carpet, in a white room. the robot is glowing. an orange robotic arm near the robot is injecting the robot’s brain with red fuel rods. a robot arm is placing red rods into the robot brain.”

3 dancing robot pikachu lined up on skate boards on the road in front of the mall and a firetruck under the sun wearing blue helmets and red boots, while holding umbrellas and surrounded by electric towers. realistic photo.

“3 pikachu standng on red blocks lined up on the road under the sun, holding umbrellas, surrounded by electric towers”
download-1-min hosted at ImgBB — ImgBB

“box cover art for the video game mario adventures 15. mario is jumping into a tall black pipe next to a system of pipes. the game case is red.”

an illustration of a baby capybara in a christmas sweater staring at its reflection in a mirror

an armchair in the shape of an avocado. an armchair imitating an avocado.
https://ibb.co/nwwf1v4

an illustration of an avocado in a suit walking a dog
https://ibb.co/bvfPkxf

pikachu riding a wave under clouds inside of a large jar on a table
https://ibb.co/jHjV7mf

a living room with 2 white armchairs and a painting of a mushroom. the painting of a mushroom is mounted above a modern fireplace.
https://ibb.co/VmKqbHk

a living room with 2 white armchairs and a painting of the collosseum. the painting is mounted above a modern fireplace.
https://ibb.co/K5fPkvj

pikachu sitting on an armchair in the shape of an avocado. pikachu sitting on an armchair imitating an avocado.
https://ibb.co/XLJV4Hb

an illustration of pikachu in a suit staring at its reflection in a mirror
https://ibb.co/nMQRccf

“a cute pikachu shaped armchair in a living room. a cute armchair imitating pikachu. a cute armchair in the shape of pikachu”
https://ibb.co/dbJ1Ks6

To use it, go to this link below, make a kaggle account, verify phone number, then in this link below, click edit it, then go to setting panel at right and turn on GPU and internet. Then replace the code below, it’s nearly same but makes it print more images. If you don’t, it doesn’t seem to work good.

https://www.kaggle.com/annas82362/mindall-e

images = images[rank]

n = num_candidates

fig = plt.figure(figsize=(6int(math.sqrt(n)), 6int(math.sqrt(n))))

for i in range(n):

ax = fig.add_subplot(int(math.sqrt(n)), int(math.sqrt(n)), i+1)

ax.imshow(images)

ax.set_axis_off()

plt.tight_layout()

plt.show()

7 Likes

When do you think video will be possible in order to make movies?

1 Like

Oh I wanted to also share this one, text prompt is filename:

NUWA is video prediction but it has a low frame rate, small frames, and isn’t so good yet. I think we will get video prediction that is fuller and human level likeness like the others, by 2024 or 2025.

1 Like

Interesting, and do you think human actors will go out of business? Or they will be renting their body because audience would like to see human actors?

1 Like

IMO human actors will end up having less jobs doing what they are “used to”, because we will be able to take a photo of them and ask for a movie of them doing anything.

However, like you said, if I wanted a movie of real people doing real things, then I’d need human actors, and I can imagine cases where I might want that, so I would be very careful in deciding to ever cut out or limit real human movies. But yes overall, dreamt or simulated humans that are paintings or real simulated beings will be useful for a massive amount of purposes.

1 Like

And how about input to the movies, do you think in our lifetime directors will be able to visualize scenes inside their head and than somehow transfer this thought to model which will than recreate the scenario? Is there any movement for this ?

Regarding those actors, I think the reason why movies are so expensive to make is because of their salary, you need to pay millions for some actor which eventually lowering down budget for effects and so on - audience would have more from movie if there is no need to pay for actor’s salary.

Thanks for you input, I will be having closer look on this area.

1 Like

Text is pretty useful for that. But for thought to model transfer, might be many years and might be unsafe. But once AGI is made in the 2030s, the rapid technology speed should allow it to be done by 2043 or so.

1 Like

I do realize some have this concept of AGI and being Far off in the Future , However it’s a Paradox , and it is a matter of ( Perspective) IMHO … As humanity continue’s to Struggle with the Basic’s of A.I. and what it’s capabilities are . The majority of Non-Programmer’s/Engi/Code everyday or even for Learning , Refuse to adjust and Accept change. Re-visit any past Civilization and it will support this . A hurdle yes, So for my 2 cents I’m saying that time can be best applied , advance knowledge & education in fields of current Avocation or Machine Learning, Comp Science, Code . Better yourself and continue to do so , thus if you can contribute to society in a certain way, that will be your choice of how you do it and if you choose to. Hope this helps .