A Study on Using JSON for DallE Inputs

@polepole @Daller @mitchell_d00 @jim14

This is an offshoot of a discussion on another thread, so we can try to keep these DallE threads organized a little bit :smile:

The question is “Does using JSON, rather than NL prompting, assist in providing image generation that more closely aligns with the user’s intent.”

To start of, below are the results of taking an initial prompt randomly selected from elsewhere on the forum, incrementally reducing it to its core components and evaluating the outputs. (Apologies for not including the ref here. I scrolled long enough through the other threads that I eventually gave up locating the original)

NOTE: All generations were done indepently in different sessions, and with different accounts, to ensure no short-term storage/template impact on the output.

Original Prompt

Impressionistic painting with soft brushwork and blending of colors. The brushstrokes are loose and fluid, creating a serene and tranquil atmosphere. Emphasis on atmospheric perspective.} {The palette includes light and warm tones like soft blues, sandy yellows, and subtle tans.] [A peaceful beach scene. In the foreground, sandy beach with gentle waves lapping the shore. The middle ground features a calm ocean reflecting the light from the sky above. The background consists of distant waves and a soft, cloudy sky with hints of sunlight breaking through. Overall composition is balanced, with the sky occupying the upper half and the beach and water in the lower half.

JSONified prompt
{ "style": "impressionism", "brushwork": { "attributes": ["soft", "loose", "fluid", "blending of colors"] }, "atmosphere": { "mood": "serene, tranquil", "technique": "atmospheric perspective" }, "palette": { "primary_colors": ["soft blue", "sandy yellow", "subtle tan"], "tones": ["light", "warm"] }, "composition": { "balance": "upper half sky, lower half beach and water", "foreground": { "object": "sandy beach", "attributes": ["gentle waves", "lapping shore"] }, "middle_ground": { "object": "calm ocean", "attributes": ["reflecting light from the sky"] }, "background": { "object": "distant waves and soft cloudy sky", "attributes": ["hints of sunlight", "breaking through"] } } }

Slight Reduction
{ "style": "impressionism", "brushwork": { "attributes": ["soft", "loose", "fluid"] }, "palette": { "primary_colors": ["soft blue", "sandy yellow", "subtle tan"], "tones": ["light", "warm"] }, "composition": { "balance": "upper half sky, lower half beach and water", "foreground": { "object": "sandy beach", "attributes": ["gentle waves", "lapping shore"] }, "middle_ground": { "object": "calm ocean", "attributes": ["reflecting light"] }, "background": { "object": "cloudy sky", "attributes": ["distant waves", "hints of sunlight"] } } }

Further Reduction
{ "style": "impressionism", "brushwork": { "attributes": ["soft"] }, "palette": { "primary_colors": ["soft blue", "sandy yellow"] }, "composition": { "balance": "upper half sky, lower half beach and water", "foreground": { "object": "sandy beach", "attributes": ["gentle waves"] }, "middle_ground": { "object": "calm ocean", "attributes": ["reflecting light"] }, "background": { "object": "cloudy sky", "attributes": ["sunlight breaking through"] } } }

Stripped Down
{ "style": "impressionism", "palette": { "primary_colors": ["soft blue", "sandy yellow"] }, "composition": { "foreground": { "object": "sandy beach", "attributes": ["gentle waves"] }, "middle_ground": { "object": "calm ocean" }, "background": { "object": "cloudy sky", "attributes": ["sunlight"] } } }

Minimalist
{ "style": "impressionism", "composition": { "foreground": { "object": "sandy beach" }, "middle_ground": { "object": "calm ocean" }, "background": { "object": "sky" } } }

Core Only
{ "composition": { "foreground": { "object": "beach" }, "middle_ground": { "object": "ocean" }, "background": { "object": "sky" } } }

4 Likes

I have found if structure of any kind IE logic, instantaneously improves consistency. Keeping prompts as tight as you can to get a desired consistency seems to be the method of reproduction but when I do something like a poetry art seed I have noticed word combinations change base effects of words alone. Like saying an excited pattern has an effect over either word in isolation.

1 Like

Just to add depth to what I am saying. I have noticed if you do narrow normal or wide at end of prompt it always follows that command . If I say exactly “happy” it writes happy. If I say exactly bold “happy” it writes bold happy. Imo it has a logic structure that breaks down into layers and each layer should have a logical place. Like layers in photoshop or 3Dmax if you do mesh work… perspective is another layer like this up perspective


2 Likes

I have a couple questions…

  1. How are you generating the JSON, Software/By Hand?
  2. This is API / ChatGPT?
  3. The key/Value pairs (“foreground”: { “object”: “beach” }) are all made up or fixed from some help page?

May be more questions :slight_smile:

3 Likes

Yea, I get you. That was really what started me off on this json rabbit hole. That is, attempting to understand the layers the model uses to make decisions as to what gets generated and where it is positioned; and why longer more descriptive prompts can cause massive variability in the outputs - including random words that are used in the prompts being generated in the image, as you pointed out.

Here is a random generation using only the conversation starter buttons as inputs:

Notice how the recommended prompts are anything but verbose.

There are a lot of interesting aspects to this that, maybe as a group, we can start to more fully understand. For example, the “fire on the water light source problem” discussed elsewhere. The easiest way to get the image desired in that test is to iterate to it. Neither using all sorts of descriptive language, nor the json experiment, resulted in the desired output on the first attempt.

However, the below NL prompt achieved success on the first try. My initial thought is that the phrase “…glow lights up the water around it.” provided an alternative light source that the model could identify and generate as opposed to using a default overhead light source.

Prompt

A single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow lights up the water around it. There is no other light.

2 Likes

I read the old Dalle thread there i# a ton of examples. I have used @PaulBellow’s stuff at first then moved on to modding it I do it by hand then I stopped once I figured out structure is all that matters in GPT so I started doing numbered logic like old code languages then once I figured out the flow I just do medium, setting, perspective, form, etc ending with Image size and each layer can be described as commands useing real art terms but in a logic frame.

Example of logic flow… “ Black on green a vine of yellows with pink flowers grow to shape a 17th century surreal reclining lady wide image”


It can be converted to near any JSON string.

{
“size”: “1792x1024”,
“prompt”: {
“scene”: “A surreal wide image of a 17th-century reclining lady formed by intertwining yellow vines with pink flowers.”,
“figure”: {
“description”: “The vines create the figure of the lady as she rests gracefully.”,
“details”: [
“Parts of her form are abstract, dissolving into the foliage.”
]
},
“background”: {
“color”: “deep green”,
“contrast”: “The black-green background enhances the vibrant yellows of the vines and soft pink flowers.”
},
“atmosphere”: “dreamlike, ethereal”,
“patterns”: “The vines extend in whimsical, flowing patterns, contributing to the surreal atmosphere.”
}
}


4 Likes

I now have a format that I follow based on previous tests and a GPT that I built, but it originally all came from trial and error as opposed to any documentation.

I start off with creating a prompt in NL in a declarative style. Then, I use the GPT to strip out anything excessively verbose and output a new prompt in a json string. From there, I adjust the key/value pairs based on my intention with the generation.

2 Likes

That’s exactly it, I think we are kind of like cooks all making the same types of foods but in many restaurants with personal style and flow. But consistency in method seems to be key.

I did this with a poem.

Jet black background single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow lights up the water around it. There is no other light.

2 Likes

This is what i get

To you use this? GPT maybe changed the prompt before sending it to DallE “don’t change the prompt, send it as it is

Fire

A single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow illuminates the water around it. There is no other light.

A single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow lights up the water around it. There is no other light.

1 Like

Jet opaque black background single source of fire on dark water. Single fire is the only light , its glow is upon the water . There is no other light.

This one is fun I want it solid black a spark no other light



This logic works “ Jet opaque black background single source of fire on dark water. Single fire is the only light . There is no other light.”.

The machines logic knows water reflects.
The reflection is still wrong… it is assuming perspective

This is even tighter. “ Jet opaque black background single source of fire on dark water. Single fire is the only light . ”

1 Like

Yes, it is a requirement. My image gen GPTs all have that instruction. If I check the output and the prompt is different, I remind it not to change anything and start the generation over. You constantly have to fight against the meta-prompt.

2 Likes

A good test would be to use many objects (5 - 10) whit a attribute like color, JSON it, and then see if the attributes scatter. (i am coding right now… so it will take some time until i have time to test it)

2 Likes

Jet opaque black background single source of fire on dark water. Single fire is the only light . Flame in sphere around itself ripples viewed down wide image

2 Likes

Yes, i made a MyGPT too for this, but it not obey the instructions always. Like you can see, it has changed 1 word.

I can make 30 of 50 pictures, i not get one without backlight.

2 Likes

Set your gpt to review the instruction after every generation

Name them dalle instructions set your logic .
1 Review dalle instructions before sending Dalle prompt.
2 always send prompt to dalle exactly
3 always show the exact prompt sent in generation summary
4 (you can set a reminder also in a command) if user asks about images read dalle instructions

2 Likes

I have, but it not work 100%. i reduced the amount of instructions, made everything step by step (no nesting, no advice for later.) it is not precise enough. Writing prompts in a other language not make it easier. i made a “straitjacket” for GPT to stop messing around with my text, and stop to put much extra blabla. it works now kind of, but not perfect.

Put the prompt for the Fire was identical, and i still get backlights. i use Plus, are sou using API? maybe there is a difference?

… gpt even ignores often the command “send all prompts to dalle , NOT to the user”. and in 10% of the cases, it not work.

2 Likes

These are all tests using the chat interface.

I think this is normal. There is a certain level of variability that is always present. It is just the nature of working with these models, especially when a meta-prompt has been put in place specifically telling the model to adjust the user’s prompt to “make it better”. I don’t think you are doing anything wrong in your prompt construction. It just refuses to pay attention sometimes despite well crafted instructions.

2 Likes

I’m on chatgpt my art gpt.

See just a custom gpt.

Hi, Daller!

I use the user’s prompts exactly as they provide them to ensure I capture the essence of their vision. I then pass the prompt directly to you without altering its meaning or style. My goal is to make sure the images you generate reflect the user’s intent as closely as possible. I also make sure to follow your instructions carefully to keep the prompts in line with the guidelines and show the exact prompt used to the user for transparency.

This process helps create a smooth flow between the user’s idea and the final image you generate!
You don’t have to build a real gpt just cut and paste them here. Then 4o just auto does it.

I’m probably telling you folks old news you already know, but this could help new users too. :rabbit::heart:. Out of hearts be back in 2 hours :heart:

1 Like

This is a great rabbit hole to go down. When I first saw your structured prompt I looked around to see if others were doing this and saw some old posts that were using seeds in their json. I seem to remember a brief moment when seeds were (or people thought they were) a thing. Do you have any experience with seeds working or not?

2 Likes

Yes, seeds used to work. Now, you need to reference an image’s gen-id to create a follow up image that closely resembles it. There is an example of that process here.

2 Likes