If I have a document that describes something with pics, how should I go about using the image? For example, if I want to explain how to change a headlight, I want to describe:
Open the hood
Look for the bulb (It should look like this image)
Turn the bulb counterclockwise 90 degrees. At this point it should look like this image.
GPT3 does not have that capability. You will need to use a different model. I’m not sure what’s publicly available, probably something. If Google releases the stuff they’ve been working on to the public, you’ll really be in business.
I’m planning on doing experiments with mixing GPT-3 and DALLE. Basically I want GPT-3 to generate a story and then describe each panel of a storyboard and then feed it into DALLE. I’ll be doing that once I get back from my vacation.
they use GPT-2 to generate captions for images. they used CLIP to control the generated captions to be matched with the images (no image is fed into GPT. images are input into CLIP).
I think I have seen somewhere SVG’s being generated with the text-version of GPT-3.
Some impressive demos, but I never got it to work myself. Maybe it could work?