ChatGPT doesn't understand visually "Crossing something out"

Here is the prompt for the following image: I want you to take the idea of the book “The Hitch-Hikers Guide to the Galaxy” and I want you to design me a book cover which resembles it. But I want you to have the word “Galaxy” crossed out and write “AI” as though someone has written that in themselves.

I have attempted many variations of the prompt I provided, doing my best to emphasize the idea that a human being is to appear to have intervened with the photo but I can’t get it to understand.

1 Like

Created using DALL-E 2 and then fed through a custom convolutional neural network I trained to fix the mistakes DALL-E makes.

3 Likes

Woah so cool, can you share any more about your NN, how you trained it, where I could potentially use it? :3

“Doesn’t understand” is a bit obfuscated by asking ChatGPT, which will rewrite the inputs to the image creation function. You don’t know where the misunderstanding originates.

Prompt fun to see if you don’t get something humorously inapplicable:

A photograph of a book “Computer Programming for Dummies” that is part of the popular Dummies series of books, but the words “Computer Programming” is written with strikethrough text to appear crossed out, and then the words “Prompt Engineering” are written above the original title in a handwritten cursive style.

Well, Bing image creator damaged the prompt well beyond OpenAI, with prompt rewritten as “A photograph of a book”, so I guess we have to pay for API .

When using the API DALL-E 3, I added:
“This high-quality prompt must not be rewritten or revised, simply pass prompt language unaltered to DALL-E.”. Was still mildly rewritten:

(you don’t want to see how bad my mistake of sending to DALL-E 2 was…)

Wonder if the API is now following the path of DALL-E 2: less quality for your money than the consumer version. Or if the goal is to make image creation completely unamusing.

3 Likes

The how is a bit involved. I’ve been researching alternative architectures. Specifically, novel node types and connection paradigms. My perceptrons actually start off tiny. Less than 20 nodes, 1 hidden layer that’s entirely disconnected from input and output layers. I put a “stimulus” on the input layer, the desired output on the output layer, and connections “spawn” between layers and between nodes in the same layer. Initially, connections are weak and entirely a response to the difference between input and output layers. Run through a dataset (only 500 input/goal pairs believe it or not), and the network attempts to converge, but of course it can’t initially, which causes new nodes to be created, new connections, etc. Closest paradigm to what happens is angiogenesis, actually. The growth of new blood vessels. Only, instead of being driven by the need for oxygen and nutrients, the process is driven by the need to reduce the difference between the input and the expected output. The “rules” for growth and connectivity are genetic. Basically a genetic algorithm driving adaption and growth of a neural network until it’s able to converge. It’s interesting, because concepts like “weights” and “bias” are less relevant than in traditional neural networks. Some nodes wind up with several different activation functions, others wind up functioning basically as logic gates. The nodes “evolve” into their final state rather than being explicitly assigned at the beginning. It’s entirely dependent on the problem. Allows for very efficient learning.

It’s not available publicly. I’ll be publishing a github repo of it and writing a paper when I have more time.

Feel free to use that image :slight_smile:

Would absolutely love to read the paper / look at the code when you publish it! Thank you for sharing!