In this thread Paul Bellow (@paul ) was kind enough to warn me about the automatic prompt enhancing/modification that Dalle-3 does for you. But note, this feature is not OPT-IN, it is OPT-OUT via a little prompt text magic. See this thread for Paul’s tips on the matter:
Also see this post by @morganhvidt too which references the OpenAI docs on how to defeat this feature:
But I must admit, I was stunned at how deep this goes. I gave Dalle-3 the very simple prompt of “images of ducks”.
When I inspected the response object that came back from Dalle-3, I saw a field I had never seen before with Dalle-2 named revised_pompt:
“Visualize a serene setting with a vibrant body of water, possibly a lake or pond, and focus on a diverse group of ducks. These birds display a range of colors and patterns in their feathers. They swim leisurely in the water, causing small ripples in their path. Some are preening their feathers, while others are engaging in playful interactions. The reflected sunlight from the water surface adds a sparkle to the overall image. The scenery has occasional splashes of water plants and reeds, which contribute to the natural habitat of these charming waterfowls.”
(“charming waterfowls”. That made me laugh. )
Wow! That’s intense. I’m wondering if they are doing an inception like self re-submit of my prompt into an LLM trained on Dalle-3 prompts as input and human quality reports as output to generate these new wildly detailed prompts? Either way its a pretty intense rabbit hole.
I guess the moral is watch your Dalle-3 response objects closely and make sure you see prompts that are actually improved and are not degrading your results. If they are, then no harm, see Paul’s thread for tips on how to “defeat” this feature, at least when it’s not helping.
Here are all the relevant links on the subject to save you time:
It really does feel like they are feeding your original prompt into a specially trained LLM. Below are the revised prompts from 3 calls to Dalle-3 all using the simple prompt “images of toys”.
PROMPT: “images of toys”
REVISED PROMPTS FOR EACH RESPONSE
Response #1
Imagine a children’s room filled with assorted toys scattered all over the place. There are wooden toys, bright colorful building blocks, soft plush animals, mini cars, and puzzles. There’s a beautifully crafted dollhouse and a toy train running on a circular track. On one corner of the room, toy robot seems to be guarding the area, while in another corner there’s a cheerful rubber duck waiting to make its way to the bathtub. The atmosphere is lively and inviting, a perfect setting for a child’s explorative playtime.
Response #2
Visualize a vivid scene of various toys spread across a children’s playroom. A wooden train track loops around the room with a shiny miniature locomotive chugging along. There’s a stuffed teddy bear sitting on a plush rug, its fur ruffled from much love and constant use. On a shelf, a line up of brightly-colored blocks with letters of alphabets waits for a creative mind to stack them. A colorful beach ball has found its home in the corner of room. The sunlight coming from the window makes the room warm and inviting.
Response #3
Consider an assortment of colorful toys spread across a light-colored surface. This includes various items like a wooden stacking ring pole, a charming wind-up tin robot, and a pull-along duck toy resting beside a plush teddy bear. A blue spinning top with red stripes is mid-spin, placing next to a toy racing car and a rubber duck. A miniature toy figurine of an astronaut is standing tall, observing the scene. Scattered on the other side are alphabet blocks spelling out ‘play’, ‘fun’ and ‘joy’. The stage of diverse toys appears to be in soft, indirect sunlight.
What is fascinating is to see the places where synonyms or phrases with a common semantic meaning were used (e.g. - “assortment”, “various”, “assorted”) across the responses, and where the LLM inserted some unique detail that does not intersect with the other responses like the phrase “alphabet blocks spelling out ‘play’, ‘fun’ and ‘joy’” used in the 3rd response.
I’d love to know what temperature values and other assorted parameters they use when “tweaking” the prompt enhancement LLM to give the resulting output the kind of uniformity they desire on the one hand, and the originality dispersion they desired on the other hand. What combination do they feel is best to help the median user? Given that there is a tension between those two traits because they drive generated text in opposing directions, this is a complex task.
I’m doing something similar with my LitRPG Adventures RPG tools website. I generate the backstory of a character then feed just the appearance section to DALLE3 (along with other info)… so I’m turning something like “Male Orc Warrior” into something more detailed … but I have control over it as it’s my own LLM “transformation”…
That’s great! Probably would be fun to then take the revised prompt text and then feed it into GPT-3.5 Turbo and have it write the story and/or dialogue too.
I actually fed an image to GPT-4-Vision and came up with this story…So, from text prompt to image back to text! haha
In the sun-drenched lands of Aleria, where the echoes of freedom ring as loudly as the clash of swords, there was once a mighty orc known as Garrosh the Unbowed. Garrosh was unlike any of his kin, not only in his towering stature and unmatched prowess in battle but in his heart, which harbored a fierce and unquenchable thirst for liberty.
His tale begins in the smoldering aftermath of a great battle, amidst the ruins of an ancient human kingdom that had fallen to the orcish hordes. Garrosh, then a mere soldier in the ranks, stumbled upon a tattered and soiled banner, its stars and stripes a stark contrast against the ashen earth. It was the flag of the fallen kingdom, a symbol of their belief in the unalienable rights of all beings to life, liberty, and the pursuit of happiness.
Struck by the ideals that this banner represented, and disillusioned by the tyranny of his own chieftain, Garrosh claimed the flag as his own. In the quiet solitude of the night, he would study the runes and writings of the human scholars, learning the tongue and wisdom of those who had crafted such beliefs.
Over time, Garrosh became a leader, a chief who sought not to conquer but to free his people from the shackles of servitude and the darkness of ignorance. He wore the flag not as a spoil of war, but as a mantle of his dedication to a new way of life for his kind.
The sight of Garrosh, garbed in the colors of a human nation, became a beacon of hope and a source of confusion. To the humans, he was a paradox, an orc who fought not for bloodlust but for the principles they held dear. To the orcs, he was either a revolutionary or a traitor, challenging the very foundation of their society.
Garrosh’s journey was not without trials, for change is a foe that strikes deeper than any sword. Yet, the flag he wore became a symbol of unity, a bridge between races, and a declaration that freedom and honor could thrive in the heart of any being, be it human or orc. And so, Garrosh the Unbowed forged a legacy that would ripple through the annals of Aleria, as the orc who draped himself in the colors of freedom.