If I have a document that describes something with pics, how should I go about using the image? For example, if I want to explain how to change a headlight, I want to describe:
- Open the hood
- Look for the bulb (It should look like this image)
- Turn the bulb counterclockwise 90 degrees. At this point it should look like this image.
GPT3 does not have that capability. You will need to use a different model. I’m not sure what’s publicly available, probably something. If Google releases the stuff they’ve been working on to the public, you’ll really be in business.
I’m planning on doing experiments with mixing GPT-3 and DALLE. Basically I want GPT-3 to generate a story and then describe each panel of a storyboard and then feed it into DALLE. I’ll be doing that once I get back from my vacation.
Very interesting! I need to go through your videos, because I don’t really “get” it yet…
sorry forgot the URL
I dont have access. Just joined the waitlist.
hello, I think this approach can work
they use GPT-2 to generate captions for images. they used CLIP to control the generated captions to be matched with the images (no image is fed into GPT. images are input into CLIP).