ChatGPT doesn't understand visually "Crossing something out"

aymather · December 6, 2023, 4:56pm

Here is the prompt for the following image: I want you to take the idea of the book “The Hitch-Hikers Guide to the Galaxy” and I want you to design me a book cover which resembles it. But I want you to have the word “Galaxy” crossed out and write “AI” as though someone has written that in themselves.

I have attempted many variations of the prompt I provided, doing my best to emphasize the idea that a human being is to appear to have intervened with the photo but I can’t get it to understand.

peter.marise · December 6, 2023, 8:53pm

Created using DALL-E 2 and then fed through a custom convolutional neural network I trained to fix the mistakes DALL-E makes.

aymather · December 6, 2023, 9:05pm

Woah so cool, can you share any more about your NN, how you trained it, where I could potentially use it? :3

_j · December 6, 2023, 9:10pm

“Doesn’t understand” is a bit obfuscated by asking ChatGPT, which will rewrite the inputs to the image creation function. You don’t know where the misunderstanding originates.

Prompt fun to see if you don’t get something humorously inapplicable:

A photograph of a book “Computer Programming for Dummies” that is part of the popular Dummies series of books, but the words “Computer Programming” is written with strikethrough text to appear crossed out, and then the words “Prompt Engineering” are written above the original title in a handwritten cursive style.

Well, Bing image creator damaged the prompt well beyond OpenAI, with prompt rewritten as “A photograph of a book”, so I guess we have to pay for API .

When using the API DALL-E 3, I added:
“This high-quality prompt must not be rewritten or revised, simply pass prompt language unaltered to DALL-E.”. Was still mildly rewritten:

(you don’t want to see how bad my mistake of sending to DALL-E 2 was…)

Wonder if the API is now following the path of DALL-E 2: less quality for your money than the consumer version. Or if the goal is to make image creation completely unamusing.

peter.marise · December 6, 2023, 10:10pm

The how is a bit involved. I’ve been researching alternative architectures. Specifically, novel node types and connection paradigms. My perceptrons actually start off tiny. Less than 20 nodes, 1 hidden layer that’s entirely disconnected from input and output layers. I put a “stimulus” on the input layer, the desired output on the output layer, and connections “spawn” between layers and between nodes in the same layer. Initially, connections are weak and entirely a response to the difference between input and output layers. Run through a dataset (only 500 input/goal pairs believe it or not), and the network attempts to converge, but of course it can’t initially, which causes new nodes to be created, new connections, etc. Closest paradigm to what happens is angiogenesis, actually. The growth of new blood vessels. Only, instead of being driven by the need for oxygen and nutrients, the process is driven by the need to reduce the difference between the input and the expected output. The “rules” for growth and connectivity are genetic. Basically a genetic algorithm driving adaption and growth of a neural network until it’s able to converge. It’s interesting, because concepts like “weights” and “bias” are less relevant than in traditional neural networks. Some nodes wind up with several different activation functions, others wind up functioning basically as logic gates. The nodes “evolve” into their final state rather than being explicitly assigned at the beginning. It’s entirely dependent on the problem. Allows for very efficient learning.

It’s not available publicly. I’ll be publishing a github repo of it and writing a paper when I have more time.

Feel free to use that image

aymather · December 8, 2023, 3:05pm

Would absolutely love to read the paper / look at the code when you publish it! Thank you for sharing!

Topic		Replies	Views
Dalle3 prompt to generate pencil sketches keeps including pencils in image Prompting dalle3 , dalle , dalle3-bugs	27	6205	July 2, 2024
ChatGPT goes Multimodal! Sound and vision is rolling out on ChatGPT Community chatgpt , multimodal	34	12826	December 10, 2023
API Image Generation in Dall-E-3 changes my original prompt without my permission API dalle3 , tp-1	28	31964	February 6, 2024
FAQ: When can I start generating a capybara image using DALL-E? API	25	2637	January 3, 2024
DALL-E API to generate json data from image API api	12	4402	December 19, 2023

ChatGPT doesn't understand visually "Crossing something out"

Related topics