Gpt-image-1.5 image creator and editor app - Python, desktop UI

GPT-Image Desktop Editor: a desktop GUI for OpenAI image generation and vision-guided edits

A Christmas present by me for you - an image playground beyond OpenAI’s offerings.

This is a developer-oriented but user-friendly single-file Python Tkinter application for generating images from prompts and editing images with prompt-guided inpainting/outfilling. It provides a canvas workspace (with transparency), an optional mask painting UI, and (for GPT-Image edits) multi-image reference inputs so you can say things like “add the penguin from image 2 into the scene”. It uses standard lib + httpx for a lighter-weight surface than the OpenAI SDK.

Desktop app: Image canvas UI is a set 1024x1024px, within which larger resolutions are scaled. A 1080p+ desktop resolution is encouraged to see the full UI length.

An idea factory: not a sellable surface.


Supported Models and Resolutions

1) Generation models (text → image)

GPT-Image models (generate + edits):

  • chatgpt-image-latest
  • gpt-image-1.5
  • gpt-image-1-mini
  • gpt-image-1

Generate output sizes you can request (GPT-Image):

  • 1024x1024
  • 1536x1024 (landscape)
  • 1024x1536 (portrait)
  • auto (model chooses; exposed in code via "auto" option)

DALL-E 3 (generate only, shutdown 2026-05-12):

  • dall-e-3

Generate output sizes you can request (DALL-E 3):

  • 1024x1024
  • 1792x1024
  • 1024x1792

In the UI, model selection automatically updates the valid size and option choices.


2) Edit models (image → edited image)

Edits are supported only on GPT-Image models in this program:

  • chatgpt-image-latest
  • gpt-image-1.5
  • gpt-image-1-mini
  • gpt-image-1

Edit output size you can request:

  • "match canvas" (UI convenience option; sends the canvas size when supported, otherwise auto)
  • 1024x1024
  • 1536x1024
  • 1024x1536
  • auto

Special parameter support:

  • input_fidelity = high|low is exposed only for gpt-image-1 (checkbox: “High input fidelity”). Seems a mandatory cost on gpt-image-1.5.

See earlier program version for dall-e-2 based “strict mask” editing based on single-image alpha transparency.


Key Features and Capabilities

1) Image generation from prompts

  • Pick a model and output size.
  • Enter a prompt and click Generate.
  • GPT-Image models support generating multiple images (n = 1–10) in one request (DALL-E 3 is forced to n=1).
  • Generated outputs are stored in Generations history and the first image is also placed onto the canvas.


2) Loading an existing image file onto the canvas (and reframing it)

Click Load Image to bring in an existing file (PNG/JPG/etc.). The app opens a specialized Image Loading/Resizing Options dialog.

In this dialog you can:

  • Choose a target canvas size (1024x1024, 1536x1024, 1024x1536).
  • Use a slider to interpolate between:
    • Outfill / “fit inside” (image shrinks to fit, leaving empty transparent canvas)
    • Crop / “cover” (image fills canvas, cropping edges)
  • Drag the preview to position the image on the canvas.
  • Optionally allow extra shrink/zoom ranges.

When you press OK, the app produces:

  • a new RGBA canvas image at the chosen resolution
  • a default user mask
  • an outfill mask that marks where “new empty canvas” exists

This UI can also be used with “resize”, to edit the current canvas, resize the current contents or the dimensions.


3) Mask drawing interface (optional edit hinting)

On the right-side canvas:

  • You can paint a mask using a brush size slider.
  • The mask is displayed as a semi-transparent red overlay.
  • Right-click on the canvas shows quick actions:
    • Draw brush / Erase brush
    • Reset mask
    • Brush size presets

Important: in GPT-Image edits, the mask is treated as a hint for the vision model, not a strict “hard allowed region” enforcement.


4) Additional images (multi-image edits with references)

In the Edit tab, you can attach additional images. These are sent alongside the main canvas image as extra vision inputs for the model (depicted in screenshot above). They do not need size or aspect ratio conformance to the canvas.

Typical uses:

  • “Add the logo from the second image to the shirt.”
  • “Use the character from image 2, but place them into this scene.”
  • “Use these product images to compose a gift basket (boring).”

The UI allows:

  • Add…
  • Remove Selected
  • Clear

Pre-upload resizing rule: each additional image is downscaled so its largest side is at most 1024px, then converted to PNG for upload (gpt-image-1x vision is actually documented limited at 512px for the shorter side, and “input_fidelity” resolution unknown)

(An array of images is allowed, but only one prompt to describe what to do with each.)


5) History: Generations (outputs) and Requests (inputs)

The app keeps two separate histories for the current session:

  • Generations: images returned by the API (generate/edit outputs)
  • Requests: the input settings and (for edits) a snapshot of the canvas + mask state sent

From history you can:

  • restore an output back to the canvas
  • restore a past request’s settings (including edit prompt, drawn mask, and additional images)
  • delete or clear entries, save multiple image generations


6) Saving and clipboard

  • Save the current canvas image as PNG.
  • On Windows, the canvas image can be copied to the clipboard with transparency preserved, or flattened.


Developer Notes: what gets transmitted (and what the user doesn’t directly see)

This section describes the hidden part: how the application packages and preprocesses image + mask data to maximize multimodal “vision understanding” during prompt-based edits. I tried to make “outfill” and “edits” much better understood so there is at least moderate mask-following.

A) What is transmitted on single-image edits

When you click Send Edit, the request includes:

  1. image[] (first image): the canvas PNG (RGBA)
  • Always the current canvas at full resolution.
  • If outfill exists and a mask is included, the canvas is preprocessed to make outfill visually legible (see below).
  1. mask (optional): a separate RGBA PNG mask image
  • Only included if:
    • the user painted a mask (mask_modified), or
    • the canvas includes outfill/empty regions (loaded_outfill)
  • This is not the legacy “alpha-only enforcement” mask; it is a vision-guided mask image optimized for model interpretation.
  1. Text fields / params
  • model, prompt, plus size, quality, background, n
  • input_fidelity only when model is gpt-image-1

B) What is transmitted on multi-image edits

Everything above, plus:

  • Additional image[] parts appended after the canvas
  • The mask applies only to the first image (the canvas)
  • Additional images can be different sizes (the app downsizes them to max side 1024 for speed/bandwidth)

Unique preprocessing beyond legacy DALL-E 2 “alpha = repaint allowed”

Legacy DALL-E 2 style edits treated alpha strictly as the programmatic editable region, nothing could be created or edited unless you drew a mask area. GPT-Image edits are vision-driven, so this app sends clearer visual cues.

1) Outfill transparency is made “visible” to the model (checkerboard RGB + alpha=0)

If a mask is being included and outfill exists, the app modifies the canvas image before upload:

  • In outfill pixels (where the outfill mask indicates empty canvas):
    • RGB is painted with a subtle checkerboard pattern
    • alpha is forced to 0 (still truly transparent)

Why: fully transparent pixels carry no RGB information; the checkerboard is a strong conventional cue that tells the vision model “this is intentionally empty/outfill space”.

It still helps to describe: “fill to edges”, “expand background replacing transparent sides”, etc.


2) The mask is a separate “vision-guided mask image”, not just a binary alpha cutout

When included, the app constructs a dedicated RGBA mask image:

  • RGB: sepia-tinted grayscale of the canvas (preserves structure, reduces color noise)
  • Alpha:
    • alpha = 0 where edits are intended (user mask or outfill)
    • alpha = 255 elsewhere
  • Extra cue for user-painted regions:
    • user-drawn editable zones are filled with light gray in RGB

Why: it gives the model both:

  • contextual scene information (in simplified grayscale)
  • a clean “transparent means edit here” intent signal

This “mask as communication” approach is tuned for multimodal understanding rather than undelivered strict enforcement.


Requirements

  • Python 3.10+ with typical desktop install (tkinter)
  • Pillow: pip install pillow (PIL image editing library)
  • httpx: pip install httpx (you’d have this if you installed openai module)
  • Funded ID-verified OpenAI account; set OPENAI_API_KEY in OS environment variables, or enter it at startup when prompted.
  • (Tolerance for lots of vibe-coding that is not reusable)

Download latest revision: https://od.lk/d/MjRfNzEzNjMwNDNf/ai_image_editor_2026-01-06.py

Usage notes:

  • A Python script started with the typical OS launcher will have a console window in addition to the UI Window. There, this program will print() both SDK parameter-style API request bodies (with elided “files”), and the “response” returned from the API, made pretty (or API logging to file without use of logging module).
  • Rename the program extension to “.pyw” to avoid this additional console window.
16 Likes

Sharing is caring. Thanks so much!

3 Likes

That is amazing! Thanks for putting this much efford into this.

3 Likes

Awesome share and great to learn from. Is this something you will be (or have on) GitHub?

Thanks for sharing.

2 Likes

Github, schmithub.

Here’s your PR for the day. Why a weekend of code improvement lasted more than the weekend.

Here’s the remaining 10% that is actually seen on top of good looking organized and typed reusable code.

New features surfaced

These should further satisfy expectations.

Status: Pricing calculated for a single request; copyable
A gray interface bar displays statistics including text in 720, image in 81732, output 18333 (4 images), and a value of image.2069. (Captioned by AI)

Generation logging: more of what you want to know:

Tired of losing an image you didn’t press “save” on? Besides that history you can save from, Autosave (configure directory path in code; intentionally without persistent settings)

Added: checkbox labeled 'Autosave'. (Captioned by AI)

Which uses auto naming, just as other locations saving your file are now populated with a default generated name.

Or get out the images directly from the canvas, with Copy now working to most any paste destination, and allowing you to choose the destination format:

Especially where “checkerboard” background gives a copy where I can show you transparency works from the AI model, but only to “cut out” shapes.

Paste to the canvas launches the same dialog for loading and resizing files.

Oh, the canvas? It’s now arbitrarily “stretchy” when I feel like releasing the UI window size. Like dalle-e-3 generation, accurate mask pen draw over its generations edited at 1792 in 1024.

Easter egg “AI Rewrite” in the prompt’s right-click menu to be found for you to code up yourself into usefulness?

There’s some chat API support and a self-contained example in the app for you…


(the AI replaced what was highlighted which you can still improve directly here; I broke the highlighting preview rather than writing more guard code there)

Sorry I don’t have more to show, like the popup error dialog that will let you now copy and spam OpenAI with request IDs with over-refusals, and pretty console logging of calls with fields like "openai-processing-ms", also a logfile to choose instead of printing (see globals), but like I said, most is under the hood or issues you won’t encounter any more… :laughing:
image


Download (+updated top post) https://od.lk/d/MjRfNzEzNjMwNDNf/ai_image_editor_2026-01-06.py

2 Likes

Very cool. Thanks again for sharing with us.

1 Like

10 of 10

thanks, master squidly!

1 Like

Very nice. Great features all around. Really like the the prompt rewriter.

Oh, the canvas? It’s now arbitrarily “stretchy” when I feel like releasing the UI window size.

Since you can get out the images directly from the canvas, what about aspect ratio when you do this? Or did I misunderstand something?

Copying & saving

Generated images are a pixel buffer preserved internally at the same dimensions as received. Saving or clipboard-copying, either way, you get the pixel-accurate no-compression contents of the underlying no-compression PNG API response, not resized. Or the canvas state. Is the question about this?

What I report is that the framework is there for more window resizing, and to get arbitrary UI area for the canvas image display container. Currently, you can widen the window and just get more control input area.

Display only: The underlying image is processed, sized appropriately to the size-adapting tk.Canvas by Pillow image library as part of the app, after determining a display scale, and rendered for display-only at scaling, along with a user-drawn mask and an outfill sentinel where a loaded image left transparency.

Why the current canvas choices when loading and sending “edit”?

Loading your file with the image loading dialog, I offer just one of the “generate” sizes to create from the default model’s support. The app is made for edit iteration (why there are not parallel workflows), and matching API I/O sizes with masking is for good revision adherence.

Different canvas choices or even custom typed-in sizes is a possibility also with a patch, changing what you send and breaking nothing:

Again, why a 1024x1024 image UI, borderless?

Consistent experience.

Fixed 1:1 is good for 1024x1024 images, you see pixel accuracy. Resulting 2:3 for wide 1536 images also gives accurate position drawing for mouse-cursor to underlying image mapping (perfect or 0.5 off). Good brush strokes. The API model is not as good as the app’s precision, even.

Arbitrary canvas size sending?

Sending images in the same aspect ratio as the model is set to generate is not a requirement, unlike with dall-e-2 which has to be pixel-accurate. On the API and in the app, you can call API with a different size than canvas image 0, though, and “size”:“auto” can even have the AI decide to return you different size than you sent.

The input image is a “prompt”, vision that is using 512px tiles, 512px also as the maximum input of the shorter dimension you are downsized to. Then you have “input_fidelity” which is like a patches model in billing, but billed without direct connection to the size you send, only the aspect ratio.

Input fidelity (mandatory billed “high” on 1.5) has an undocumented maximum resolution or maximum “seeing token size”. And mask, of unclear prompting use of ‘alpha’ exclusively or RGB also. All are completely lacking in technology documentation, where a “prompting guide” (what to send) or an application programming guide needs foundation in understanding the image ingestion deeply - that OpenAI is unwilling to provide. So, if you want to dig deep, you can try to improve the “visual prompting” currently done.

2 Likes

Considering the previous thoughts, I’ve made a little push you can try:

Arbitrary canvas size

In the image loading/paste/resize dialog: Type whatever canvas resolution you want and press enter. (Type something too big or too small, the same aspect ratio is rewritten to sizes within what is documented for vision/tiles.)

The program already suggested the closest “generate” aspect ratio canvas when loading.

You can also choose in the dropdown (clamped to maximum usefulness):

  • the original image size:

Plus I added some smaller sizes: what the gpt-image-1 API actually does to your tile-based input images.

UI behavior of changing canvas sizes, and repeatedly dragging around images and changing sizes and scaling again should all fall within expectations.

Result

Send to the AI model any resolution or aspect as your first image. Control billing, or see the results of “playground” play - what a developer should be able to experiment with.

Download this version:
https://od.lk/d/MjRfNzEzODQwNzJf/ai_image_editor_2026-01-07.py

Notes

There are optimizations one could do to additional images in a request to save money on forced “input_fidelity”: make rectangular images closer to square, pad the extra with transparency, allow side cropping there also. Not done because transparency to ignore is likely contrary to training.

1 Like

Thanks again for sharing with us.