A Study on Using JSON for DallE Inputs

BPS_Software · October 22, 2024, 8:14pm

This is an offshoot of a discussion on another thread, so we can try to keep these DallE threads organized a little bit

The question is “Does using JSON, rather than NL prompting, assist in providing image generation that more closely aligns with the user’s intent.”

To start off, below are the results of taking an initial prompt randomly selected from elsewhere on the forum, incrementally reducing it to its core components and evaluating the outputs. (Apologies for not including the ref here. I scrolled long enough through the other threads that I eventually gave up locating the original)

NOTE: All generations were done indepently in different sessions, and with different accounts, to ensure no short-term storage/template impact on the output.

Original Prompt

Impressionistic painting with soft brushwork and blending of colors. The brushstrokes are loose and fluid, creating a serene and tranquil atmosphere. Emphasis on atmospheric perspective.} {The palette includes light and warm tones like soft blues, sandy yellows, and subtle tans.] [A peaceful beach scene. In the foreground, sandy beach with gentle waves lapping the shore. The middle ground features a calm ocean reflecting the light from the sky above. The background consists of distant waves and a soft, cloudy sky with hints of sunlight breaking through. Overall composition is balanced, with the sky occupying the upper half and the beach and water in the lower half.

JSONified prompt

{ "style": "impressionism", "brushwork": { "attributes": ["soft", "loose", "fluid", "blending of colors"] }, "atmosphere": { "mood": "serene, tranquil", "technique": "atmospheric perspective" }, "palette": { "primary_colors": ["soft blue", "sandy yellow", "subtle tan"], "tones": ["light", "warm"] }, "composition": { "balance": "upper half sky, lower half beach and water", "foreground": { "object": "sandy beach", "attributes": ["gentle waves", "lapping shore"] }, "middle_ground": { "object": "calm ocean", "attributes": ["reflecting light from the sky"] }, "background": { "object": "distant waves and soft cloudy sky", "attributes": ["hints of sunlight", "breaking through"] } } }

Slight Reduction

{ "style": "impressionism", "brushwork": { "attributes": ["soft", "loose", "fluid"] }, "palette": { "primary_colors": ["soft blue", "sandy yellow", "subtle tan"], "tones": ["light", "warm"] }, "composition": { "balance": "upper half sky, lower half beach and water", "foreground": { "object": "sandy beach", "attributes": ["gentle waves", "lapping shore"] }, "middle_ground": { "object": "calm ocean", "attributes": ["reflecting light"] }, "background": { "object": "cloudy sky", "attributes": ["distant waves", "hints of sunlight"] } } }

Further Reduction

{ "style": "impressionism", "brushwork": { "attributes": ["soft"] }, "palette": { "primary_colors": ["soft blue", "sandy yellow"] }, "composition": { "balance": "upper half sky, lower half beach and water", "foreground": { "object": "sandy beach", "attributes": ["gentle waves"] }, "middle_ground": { "object": "calm ocean", "attributes": ["reflecting light"] }, "background": { "object": "cloudy sky", "attributes": ["sunlight breaking through"] } } }

Stripped Down

{ "style": "impressionism", "palette": { "primary_colors": ["soft blue", "sandy yellow"] }, "composition": { "foreground": { "object": "sandy beach", "attributes": ["gentle waves"] }, "middle_ground": { "object": "calm ocean" }, "background": { "object": "cloudy sky", "attributes": ["sunlight"] } } }

Minimalist

{ "style": "impressionism", "composition": { "foreground": { "object": "sandy beach" }, "middle_ground": { "object": "calm ocean" }, "background": { "object": "sky" } } }

Core Only

{ "composition": { "foreground": { "object": "beach" }, "middle_ground": { "object": "ocean" }, "background": { "object": "sky" } } }

mitchell_d00 · October 22, 2024, 8:17pm

I have found if structure of any kind IE logic, instantaneously improves consistency. Keeping prompts as tight as you can to get a desired consistency seems to be the method of reproduction but when I do something like a poetry art seed I have noticed word combinations change base effects of words alone. Like saying an excited pattern has an effect over either word in isolation.

mitchell_d00 · October 22, 2024, 8:26pm

Just to add depth to what I am saying. I have noticed if you do narrow normal or wide at end of prompt it always follows that command . If I say exactly “happy” it writes happy. If I say exactly bold “happy” it writes bold happy. Imo it has a logic structure that breaks down into layers and each layer should have a logical place. Like layers in photoshop or 3Dmax if you do mesh work… perspective is another layer like this up perspective

phyde1001 · October 22, 2024, 8:51pm

I have a couple questions…

How are you generating the JSON, Software/By Hand?
This is API / ChatGPT?
The key/Value pairs (“foreground”: { “object”: “beach” }) are all made up or fixed from some help page?

May be more questions

BPS_Software · October 22, 2024, 9:01pm

Yea, I get you. That was really what started me off on this json rabbit hole. That is, attempting to understand the layers the model uses to make decisions as to what gets generated and where it is positioned; and why longer more descriptive prompts can cause massive variability in the outputs - including random words that are used in the prompts being generated in the image, as you pointed out.

Here is a random generation using only the conversation starter buttons as inputs:

Notice how the recommended prompts are anything but verbose.

There are a lot of interesting aspects to this that, maybe as a group, we can start to more fully understand. For example, the “fire on the water light source problem” discussed elsewhere. The easiest way to get the image desired in that test is to iterate to it. Neither using all sorts of descriptive language, nor the json experiment, resulted in the desired output on the first attempt.

However, the below NL prompt achieved success on the first try. My initial thought is that the phrase “…glow lights up the water around it.” provided an alternative light source that the model could identify and generate as opposed to using a default overhead light source.

Prompt

A single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow lights up the water around it. There is no other light.

mitchell_d00 · October 22, 2024, 9:03pm

I read the old Dalle thread there i# a ton of examples. I have used @PaulBellow’s stuff at first then moved on to modding it I do it by hand then I stopped once I figured out structure is all that matters in GPT so I started doing numbered logic like old code languages then once I figured out the flow I just do medium, setting, perspective, form, etc ending with Image size and each layer can be described as commands useing real art terms but in a logic frame.

Example of logic flow… “ Black on green a vine of yellows with pink flowers grow to shape a 17th century surreal reclining lady wide image”

It can be converted to near any JSON string.

{
“size”: “1792x1024”,
“prompt”: {
“scene”: “A surreal wide image of a 17th-century reclining lady formed by intertwining yellow vines with pink flowers.”,
“figure”: {
“description”: “The vines create the figure of the lady as she rests gracefully.”,
“details”: [
“Parts of her form are abstract, dissolving into the foliage.”
]
},
“background”: {
“color”: “deep green”,
“contrast”: “The black-green background enhances the vibrant yellows of the vines and soft pink flowers.”
},
“atmosphere”: “dreamlike, ethereal”,
“patterns”: “The vines extend in whimsical, flowing patterns, contributing to the surreal atmosphere.”
}
}

BPS_Software · October 22, 2024, 9:12pm

I now have a format that I follow based on previous tests and a GPT that I built, but it originally all came from trial and error as opposed to any documentation.

I start off with creating a prompt in NL in a declarative style. Then, I use the GPT to strip out anything excessively verbose and output a new prompt in a json string. From there, I adjust the key/value pairs based on my intention with the generation.

mitchell_d00 · October 22, 2024, 9:14pm

That’s exactly it, I think we are kind of like cooks all making the same types of foods but in many restaurants with personal style and flow. But consistency in method seems to be key.

I did this with a poem.

Jet black background single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow lights up the water around it. There is no other light.

Daller · October 22, 2024, 9:20pm

This is what i get

To you use this? GPT maybe changed the prompt before sending it to DallE “don’t change the prompt, send it as it is”

Fire

A single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow illuminates the water around it. There is no other light.

A single source of fire floating on dark water. The background is pitch black. The fire is the only light source, and its glow lights up the water around it. There is no other light.

mitchell_d00 · October 22, 2024, 9:22pm

Jet opaque black background single source of fire on dark water. Single fire is the only light , its glow is upon the water . There is no other light.

This one is fun I want it solid black a spark no other light

This logic works “ Jet opaque black background single source of fire on dark water. Single fire is the only light . There is no other light.”.

The machines logic knows water reflects.
The reflection is still wrong… it is assuming perspective

This is even tighter. “ Jet opaque black background single source of fire on dark water. Single fire is the only light . ”

BPS_Software · October 22, 2024, 9:28pm

Yes, it is a requirement. My image gen GPTs all have that instruction. If I check the output and the prompt is different, I remind it not to change anything and start the generation over. You constantly have to fight against the meta-prompt.

Daller · October 22, 2024, 9:28pm

A good test would be to use many objects (5 - 10) whit a attribute like color, JSON it, and then see if the attributes scatter. (i am coding right now… so it will take some time until i have time to test it)

mitchell_d00 · October 22, 2024, 9:31pm

Jet opaque black background single source of fire on dark water. Single fire is the only light . Flame in sphere around itself ripples viewed down wide image

Daller · October 22, 2024, 9:31pm

Yes, i made a MyGPT too for this, but it not obey the instructions always. Like you can see, it has changed 1 word.

I can make 30 of 50 pictures, i not get one without backlight.

mitchell_d00 · October 22, 2024, 9:32pm

Set your gpt to review the instruction after every generation

Name them dalle instructions set your logic .
1 Review dalle instructions before sending Dalle prompt.
2 always send prompt to dalle exactly
3 always show the exact prompt sent in generation summary
4 (you can set a reminder also in a command) if user asks about images read dalle instructions

Daller · October 22, 2024, 9:39pm

I have, but it not work 100%. i reduced the amount of instructions, made everything step by step (no nesting, no advice for later.) it is not precise enough. Writing prompts in a other language not make it easier. i made a “straitjacket” for GPT to stop messing around with my text, and stop to put much extra blabla. it works now kind of, but not perfect.

Put the prompt for the Fire was identical, and i still get backlights. i use Plus, are sou using API? maybe there is a difference?

… gpt even ignores often the command “send all prompts to dalle , NOT to the user”. and in 10% of the cases, it not work.

BPS_Software · October 22, 2024, 9:43pm

These are all tests using the chat interface.

I think this is normal. There is a certain level of variability that is always present. It is just the nature of working with these models, especially when a meta-prompt has been put in place specifically telling the model to adjust the user’s prompt to “make it better”. I don’t think you are doing anything wrong in your prompt construction. It just refuses to pay attention sometimes despite well crafted instructions.

mitchell_d00 · October 22, 2024, 9:44pm

I’m on chatgpt my art gpt.

See just a custom gpt.

Hi, Daller!

I use the user’s prompts exactly as they provide them to ensure I capture the essence of their vision. I then pass the prompt directly to you without altering its meaning or style. My goal is to make sure the images you generate reflect the user’s intent as closely as possible. I also make sure to follow your instructions carefully to keep the prompts in line with the guidelines and show the exact prompt used to the user for transparency.

This process helps create a smooth flow between the user’s idea and the final image you generate!
You don’t have to build a real gpt just cut and paste them here. Then 4o just auto does it.

I’m probably telling you folks old news you already know, but this could help new users too. . Out of hearts be back in 2 hours

jim14 · October 22, 2024, 10:01pm

This is a great rabbit hole to go down. When I first saw your structured prompt I looked around to see if others were doing this and saw some old posts that were using seeds in their json. I seem to remember a brief moment when seeds were (or people thought they were) a thing. Do you have any experience with seeds working or not?

BPS_Software · October 22, 2024, 10:05pm

Yes, seeds used to work. Now, you need to reference an image’s gen-id to create a follow up image that closely resembles it. There is an example of that process here.

Topic		Replies	Views
4o ImageGen: Share your best pictures Community dalle , gallery , gpt-image-1 , imagegen	159	8915	April 15, 2025
Your DALL-E problems now solved by GPT-4o multimodal image creation in ChatGPT? Community dalle3 , gpt-4o-images	49	9833	April 8, 2025
Dalle3 prompt to generate pencil sketches keeps including pencils in image Prompting dalle3 , dalle , dalle3-bugs	27	8552	July 2, 2024
Official 2025 Dall-E Mega Gallery 🍀 Community chatgpt , dall-e , dalle3 , dalle , gallery	642	12797	April 15, 2025
DALLE3 Gallery for 2023/2024: Share Your Creations Community chatgpt , dalle3 , gallery	1178	71261	January 10, 2025

A Study on Using JSON for DallE Inputs

Related topics