Generating pixel matrixes of black and white images with GPT-3

For a small university project, I am trying to generate small images from text with GPT-3. My goal is that the completion is a matrix of 0s and 1s which shows the object I asked for in the prompt.

I have been playing around with very simple 8x8 matrixes showing a digit. With few shot training and very simple prompts, I was able to generate digits that were not used in the prompt. However, it was not very consistent and also could not further improve (e.g, to show smiley faces).

Q: Make a grid of pixels that form an image of the number eight
A:
00000000
00111100
01100110
00111100
01100110
00110100
00011000
00000000

Q: Show a pixel matrix which represents a white and black image of the number one
A:
00000000
00110000
00011000
00011000
00011000
00011000
00011000
00000000

Q: Create an black and white image showing the number three by outputting a pixel grid of the size 8 by 8
A: 
00000000
00111110
00000110
00111100
00001110
00000110
00111100
00000000

Q: Create an 8 by 8 pixel matrix that shows the a black and white image of the number 7
A:
00000000
01111110
00000110
00001100
00011000
00110000
01100000
00000000

Q: Create an black and white image showing the number 5 by generating a pixel grid of the size 8 by 8
A: 

Completely new to GPT-3, I also tried fine tuning it with 500 examples of black and white digits with 28x28 pixels. But after reading the tips by @daveshapautomator I think that I am not ready yet for fine tuning and my task could also be done with better prompts.

So I have played around with improving and specifying the prompt. However, it does not generate my desired results.

The task is to generating a black and white image by creating a pixel matrix 
which consists of 0s and 1s. The matrix should be of the size 8 by 8. 
Each entry represents a pixel of a black and white image. That means 
the image has a display capable of 8 pixels in width and 8 pixels in height. 
Here is an example of an 8 by 8 pixel matrix showing the number 3
00000000
00111110
00000110
00111100
00001110
00000110
00111100
00000000

And here is an example of a grid of pixels that form an image of the number eight
00000000
00111100
01100110
00111100
01100110
00110100
00011000
00000000

Now create a pixel matrix of size 8 by 8 that represents an black and white image of the number 4.

I was also thinking about limiting the completion vocabulary to 0 and 1s by using the logit bias.

Any tips on how to approach this task and how to design the prompt are highly appreciated :slight_smile:

1 Like

I did some experimenting with this for random dungeon generation. Can’t remember where I put the notes… might be here on this forum? Hrm…

1 Like

Hi,

You can achieve your goal with Code Davinci 2 (select the model in the playground).

I edited your prompt slightly (added some info) and provided 4 examples as part of the prompt (prompt in bold.):

Images displayed on a computer screen are actually a collection of dots of color, called pixels. If you look really closely at the screen, you will be able to see the individual pixels. The collection of pixels that make up an image are stored as a matrix.

We can represent a number by creating a pixel matrix which consists of 0s and 1s. The matrix should be of the size 8 by 8. Each entry represents a pixel of a black and white image. That means the image has a display capable of 8 pixels in width and 8 pixels in height.

Here is an example of an 8 by 8 pixel matrix showing the number 3:
00000000
00111110
00000110
00111100
00001110
00000110
00111100
00000000

###

Here is an example of a grid of pixels that form an image of the number eight:
00000000
00111100
01100110
00111100
01100110
00110100
00011000
00000000

###

Here is an example of a grid of pixels that form an image of the number six:
00000000
00111100
01100000
01100000
00111100
01100110
00110100
00000000

###

Here is an example of a grid of pixels that form an image of the number two:
00000000
00111100
01100110
00001100
00011000
00110000
01111110
00000000

###

Here is an example of a grid of pixels that form an image of the number four:
00000000
00001100
00011100
00111100
01111110
00001100
00001100
00000000

Here is an example of a grid of pixels that form an image of the number five:
00000000
01111110
01100000
01111100
00000110
01100110
00111100
00000000

Here is an example of a grid of pixels that form an image of the number seven:
00000000
01111110
00000110
00001100
00011000
00011000
00011000
00000000

Here is an example of a grid of pixels that form an image of the number one:
00000000
00011000
00111100
00011000
00011000
00011000
00011000
00000000

A few image tests:

Here is an example of a grid of pixels that form an image of a smiley face:
00000000
00111100
01000010
10100101
10000001
10100101
01000010
00111100

Here is an example of a grid of pixels that form an image of a frowny face:
00000000
00111100
01000010
10100101
10000001
10011001
01000010
00111100

Here is an example of a grid of pixels that form an image of a heart:
00000000
01100110
11111111
11111111
01111110
00111100
00011000
00000000

Not sure about the faces, but the heart seems ok. 8X8 is too limiting for good images maybe?
I think if you use 16X16 matrix for your task framing/examples and generations you may be able to achieve more interesting results. Maybe provide 3 16X16 numbers & 3 16X16 images as examples and see how your generations turn out.
If inconsistent then next step would be to increase number of examples.

2 Likes

Thank you so much for your response! First results look good :slight_smile:
I have experimented a little bit more with objects and also think 8 by 8 might be too limiting. I am also considering adding more values to represent different colors to increase expressiveness.
And I was thinking about adding a chain of thought to the prompt. I will keep you updated on the results.

1 Like

Look forward to seeing how you progress with this. Could be very interesting to hook the inputs up to the output from speech to text, and hook the generated outputs up to colour LED matrix display!!!