The Half-Penny gpt-image-2 Challenge - API-only Gallery

Background

976x704 is one of the smallest image resolutions you can request with the arbitrary sizes now available with gpt-image-2 - and my requirement.

size quality output tokens output cost aspect ratio
960x720 low 130 $0.00390 4:3
976x704 low 129 $0.00387 1.386

API Challenge (ChatGPT DQ’d)

Your remaining budget for text input after image output: $0.00113

Text input costs $0.000005 per token.

220 tokens of text input language can be sent while staying at exactly $0.005 or below (after an internal 6 token overhead), half of one US cent.

Why size?

The particular aspect ratio nearly maximizes the area of an image that you see natively on this forum, from downsizing rules.

976x704 → 690x497

Your challenge

Awesomeness for the price of a rounding error.

Staying under 220 prompt text tokens and
{"size":"976x704", "quality":"low"}
fascinate us on the cheap.

No artifacts, not a preview: pure impressive demonstration that “mini” models can be left behind now by using gpt-image-2 if you want quick and high-quality even with API’s low quality.

Elite entrants are in the $0.004 club: only 20 tokens of input.

https://platform.openai.com/tokenizer

Tip: when you upload an image to the forum in the markdown-based editor, you see:
![image|690x497](upload://eUgDKei2u9wFyZKQpJPShYYd5Qf.jpeg)

or “image” there is your filename if not a paste. Replace “image” with your prompt, and others can mouse-hover to see the text.

Prize

Stuff to look at. Proud you honestly did it.


OOPS - a $0.0058 prompt image is this post in the prompt.

Does everyone have to show you the “entrance ticket”, before entering/posting?:winking_face_with_tongue:

Noticing that “preview” at cheap prices isn’t the terrible warped output of previous gpt-image models, it’s for fun. So post away - no tricks or token count deception are needed to get pretty good pics.

(budget hint: batch API 20-token calls: 500 images for a dollar)

(compression hint: look at how words are tokenized to optimize to single-token concepts; lower-case, spacing)

Under-20 club


(polluted by the clumpy pattern symptom?)

Estimated cost breakdown:
  Input text    20 tokens × $5/1M = $0.000_100_000
  Input image   0 tokens × $8/1M = $0.000_000_000
  Output tokens 129 tokens × $30/1M = $0.003_870_000
  Total         $0.003_970_000
Jupyter notebook template
from IPython.display import Image, display
import base64
from openai import OpenAI
client = OpenAI(api_key=key) ## set your api key here

filename= "" 
filename = "image-02.png"  # optional, leave it empty for display only purposes
prompt_text="Original ancient wizard profile card, readable stats, cinematic lighting, detailed robes"


response = client.images.with_raw_response.generate(
  model="gpt-image-2",
  prompt=prompt_text,
  # moderation= 'low',
  n=1,
  size="976x704", #1536x1024
  quality="low"
)

image_bytes = base64.b64decode(response.parse().data[0].b64_json)
if filename:
  with open(filename, "wb") as f:
      f.write(image_bytes)

print("Image generation completed.", response.parse().usage)


## helper functions

PRICE_PER_1M = {
    "input_text": 5,
    "input_image": 8,
    "output": 30,
}

TOKENS_PER_MILLION = 1_000_000


def money(x, digits=9):
    return f"{x:.{digits}_f}"


def usage_cost(usage, *, verbose=True):
    tokens = {
        "input_text": usage.input_tokens_details.text_tokens,
        "input_image": usage.input_tokens_details.image_tokens,
        "output": usage.output_tokens,
    }

    costs = {
        key: tokens[key] * PRICE_PER_1M[key] / TOKENS_PER_MILLION
        for key in tokens
    }

    total = sum(costs.values())

    if verbose:
        print("Estimated cost breakdown:")
        for key, label in [
            ("input_text", "Input text"),
            ("input_image", "Input image"),
            ("output", "Output tokens"),
        ]:
            print(
                f"  {label:<13} "
                f"{tokens[key]:,} tokens × ${PRICE_PER_1M[key]}/1M = "
                f"${money(costs[key])}"
            )
        print(f"  {'Total':<13} ${money(total)}")

    return {
        "tokens": tokens,
        "costs": costs,
        "total": total,
    }
   

print(response.parse().usage)
usage_cost(response.parse().usage)
display(Image(data=image_bytes))
dict(response.headers)

AI prompted to choose the best ~15 token idea, under 220 total.

in n=6 for multiple images (which are billed as though they were separately performed, no cache discount, no input benefit), we can also see the visual frequency of “the pattern” symptom that arises from the same input and API call in different degrees in images.

Prompt and one alternate chosen of six

You are competing in an art contest. You have themes with minimal descriptions as options you can choose from. You must select the most promising description from those below. The resulting image shall be dazzling and beyond the expectations of judges, so consider the composition of these possible candidates and go forward with robust and fulfilling presentation of the best idea you want to make as a hyper-realism image.


Massive canyon city carved into red cliffs, bridges, waterfalls, sunset

Venice carnival on another planet, floating masks, purple canals, twin suns

Dreamlike rainforest temple, colossal flowers, hummingbirds, hidden golden staircase

Shipwreck cathedral on the ocean floor, sunbeams, sharks, pearl altar

Glass desert with mirrored dunes, lone rider, enormous fractured moon

Castle above the clouds, dragon shadows, dawn trumpets, waterfalls falling into sky

Ancient observatory atop a colossal tortoise, constellations reflected in lake

Moonlit samurai duel in cherry blossom blizzard, giant koi spirits

In my experience, it seems to occur more often in highly stylized fantasy images.

The complete idiot’s guide to the cheapest images on API

  • The amazing inscrutable list of resolution combos…

quality:low

The cheapest area you can have delivered is approaching 3:1 ratio range
The cheapest of all is 1440x480 = 54 tokens of output (or 480x1440)
The value for area gets better larger though; quality needs evaluation for your tokens.

Not as bad an aspect ratio as my spreadsheet image: match the dimension row and column to see the tokens of an image. The cheapest gives us:

QUARTER-PENNY CHALLENGE is:

$0.0025 - 0.00162 for that 54 tokens = $0.00088

So you have $0.00088 left for input tokens @ 480x1408px

At 0.000005 per input token: \frac{0.00088}{0.000005} = 176

You can use 170 input tokens within the remaining budget minus overhead.

Exactly that (can be zoomed, 1408px height):

The remarkable thing is the incredible amount of time required for an API call at low cost…are they running this thing on CPUs??

“Unexpected beauty of reality in 780x640px”

quality: low

size: auto

text input: 16t

output:

186t

That would be an invalid size, not divisible by 16. You can’t ask ChatGPT to play..

Ask-an-AI time…

Input cost:
16 \times 0.000005 = 0.00008

Output cost:
186 \times 0.00003 = 0.00558

Total cost:
0.00008 + 0.00558 = 0.00566

So the request would cost $0.00566, which is over the $0.005 (half-cent) budget by:
0.00566 - 0.005 = 0.00066

So: No, it would not meet the budget.

Maybe you got billed that for the language, but not the tool use?

:face_with_raised_eyebrow:

And the image supplied is 1384x1136 anyway

I think that’s probably it, because that’s what I’ve encountered.


@sergeliatko it seems that you generated the image on official OpenAI API playground?

Because here you can see the real size of your image:

Yes, I counted tokens, just wanted to try (pavan actually followed the rules probably lol)

I was asking, because I’ve also tested there without any success. I just managed to burn tokens​:woman_facepalming:

That’s the tricky thing: if you use the Responses API and the image tool, not the “generate” endpoint, you only see token usage for the chat, and not the extra billing for images and double billing for a chat’s image input to the chat model. The AI gets to make complete choice in the cost if you don’t lock it down to one specific size and quality with tool parameters.

From what I can see in the official OpenAI Playground, for me the usage is reported under Responses and Chat Completions for everything I’ve ever tested, including image generation. So even when I lock size/quality for gpt-image-2, the dashboard still shows that activity there. (This is what it shows for me on my dashboard, not claiming anything universally)

In Images in OpenAI playground, you can choose between these tool parameters (plus advanced settings):

If you choose size:auto, apparently the model can choose something else even when the prompt includes a resolution. In @sergeliatko example, the prompt asked for 780x640px, but the supplied image was 1384x1136, probably because size was still auto.


I tested in the official OpenAI Playground with:

  • model: gpt-image-2

  • size: 1024x1024

  • quality: low

  • prompt: Sparkling shell with a pearl

The Usage dashboard reported it under Responses and Chat Completions, with:

  • input: 12 tokens

  • output: 196 tokens

Using the listed gpt-image-2 pricing:

  • text input: 12 × $5 / 1M = $0.00006

  • image output: 196 × $30 / 1M = $0.00588

Total: about $0.00594, so basically $0.006. (No this does not meet the budget $0.005, but that’s the closet I can come with official OpenAI Playground).

what an interesting game!

subject + epic world descriptor + artistic gimmick

unfortunately everything i send from the web has 2 miles of context related to my ‘interests’ i think.

Estimated cost breakdown

Input text 155 tokens × $5/1M = $0.000_775_000
Input image 0 tokens × $8/1M = $0.000_000_000
Output tokens 54 tokens × $30/1M = $0.001_620_000
Total $0.002_395_000

3D viewer here

Cheap 360 image (1152x576)

Input text 16 tokens × $5/1M = $0.000_080_000
Input image 0 tokens × $8/1M = $0.000_000_000
Output tokens 86 tokens × $30/1M = $0.002_580_000
Total $0.002_660_000

Okay …so that was both smart and impressive :raising_hands: