That jives!

Strange how much wider becomes cheaper in unexpected ways, behind an unpublished algorithm. Now: can more than 86 tokens worth of concrete information be conveyed in an image under 86 tokens? Would be a neat trick.
That jives!

Strange how much wider becomes cheaper in unexpected ways, behind an unpublished algorithm. Now: can more than 86 tokens worth of concrete information be conveyed in an image under 86 tokens? Would be a neat trick.
There be randomness in that thar’ output sequence! The image model can create its own content.
You are using your imagination to provide original creative atypical compositions. You pick a theme, persons or characters, setting, and composition randomly almost like a mad-lib roll of the dice, to see what far-fetched result comes out.
Labeled grounding: at the top of the image is labeled a subject (person or characters), setting (unusual location or activity outside the character universe), then style and composition (variety of photographic styles or renderings). This labeling at the top then serves as guidance for creating the remainder of the contents, surprising and entertaining each time!
These seem particularly good quality - the key is really less prompt instruction about the content? Cloud cities and underwater settings seem favored.
I actually think that it is. In my experience, less prompt instruction often gives better output than long, highly descriptive prompts.
For fantasy images, like the examples you showed, I also don’t think one always need to explicitly prompt “high stylistic fantasy.” Sometimes a word like “magical” is enough for the output to lean that way.
Subject: levitating turtle Setting: above water Style & composition: magical, wide-angle Input: 18 Output: 118 Cost: $0.003684 (size:960x688)
Subject: a cat with a cape Setting: sparkling cliff Style & composition: magical, wide-angle Input: 19 Output : 118 Cost: $0.003692 (size: 960x688)
Total for both images: ~ $0.0074
Did I just go under $0.0084 for two images? I think I just did![]()
Yes, low prices are quite achievable, 1/10th the price of a DALL-E 3 square image. Especially if you want to go “wide”, instead of creating an image targeting this forum’s 1.4:1 ratio (for a largest area seen here).
You can explore and discover the break points to lower pricing in the calculator
Surprising, near full HD, similar pricing as the smaller picture:
| size | quality | output tokens | output cost | aspect ratio |
|---|---|---|---|---|
| 976x704 | low | 129 | $0.00387 | 1.386 |
| 1920x1008 | low | 126 | $0.00378 | 1.9 |
Then, though, a click needed to see what’s going on:
Well this one is very wide, so it may need to be clicked to see properly, but it came in at 1408×480, low quality, input 13 / output 54, total about $0.001724. Prompt: young witch, castle, magical.
Simple prompt, extremely cheap and good visual output. Only downside: her hand holding the wand and yes I tried to edit it, but somehow both hands disappeared instead ![]()
I provide the text. It’s recited back in the image. No running in an original direction.
The concept is completed, under budget.
The style can go in different directions when we are ambiguous, instructing only a “multipanel presentation on…”
however, a bit meandering if I say not to exactly repeat my text.
Compare content to those of the prompt idea brainstorming AI, which apparently couldn’t believe that i wanted just another trope theme statement alone, and had to include its own movies to depict.
“The Artificial Reality Reveal”
The world seemed normal until the curtain pulls back.
Examples:
- *The Matrix* — Neo waking in the pod.
- *The Truman Show* — Truman reaching the edge of the set.
- *Dark City* — city manipulated by hidden forces.
- *Inception* — dream architecture folding and collapsing.
- *The Cabin in the Woods* — horror scenario controlled from below.
- *Pleasantville* — black-and-white world gaining color.
Instant filler for any weekly top factoid site, far easier and at higher resolutions than I anticipated would be still low cost.
Image 1
Prompt: “a train station where departures are seasons instead of cities”
Size: 1408×480
Quality: low
Input tokens: 17
Output tokens: 54
Image 2
Prompt: “a train station with four platforms, each opening into a different season, travelers choosing their season, cinematic panoramic scene”
Size: 1408×480
Quality: low
Input tokens: 31
Output tokens: 54
What stood out to me is that the second prompt had notably more input, but the visual output didn’t really get better. If anything, it seemed to get more cluttered and less clear.
In the second image, there’s more visual noise:
Maybe, less is more?