GPT-Image Desktop Editor: a desktop GUI for OpenAI image generation and vision-guided edits
A Christmas present by me for you - an image playground beyond OpenAI’s offerings.
This is a developer-oriented but user-friendly single-file Python Tkinter application for generating images from prompts and editing images with prompt-guided inpainting/outfilling. It provides a canvas workspace (with transparency), an optional mask painting UI, and (for GPT-Image edits) multi-image reference inputs so you can say things like “add the penguin from image 2 into the scene”. It uses standard lib + httpx for a lighter-weight surface than the OpenAI SDK.
Desktop app: Image canvas UI is a set 1024x1024px, within which larger resolutions are scaled. A 1080p+ desktop resolution is encouraged to see the full UI length.
An idea factory: not a sellable surface.
Supported Models and Resolutions
1) Generation models (text → image)
GPT-Image models (generate + edits):
chatgpt-image-latestgpt-image-1.5gpt-image-1-minigpt-image-1
Generate output sizes you can request (GPT-Image):
1024x10241536x1024(landscape)1024x1536(portrait)auto(model chooses; exposed in code via"auto"option)
DALL-E 3 (generate only, shutdown 2026-05-12):
dall-e-3
Generate output sizes you can request (DALL-E 3):
1024x10241792x10241024x1792
In the UI, model selection automatically updates the valid size and option choices.
2) Edit models (image → edited image)
Edits are supported only on GPT-Image models in this program:
chatgpt-image-latestgpt-image-1.5gpt-image-1-minigpt-image-1
Edit output size you can request:
"match canvas"(UI convenience option; sends the canvas size when supported, otherwiseauto)1024x10241536x10241024x1536auto
Special parameter support:
input_fidelity = high|lowis exposed only forgpt-image-1(checkbox: “High input fidelity”). Seems a mandatory cost on gpt-image-1.5.
See earlier program version for dall-e-2 based “strict mask” editing based on single-image alpha transparency.
Key Features and Capabilities
1) Image generation from prompts
- Pick a model and output size.
- Enter a prompt and click Generate.
- GPT-Image models support generating multiple images (n = 1–10) in one request (DALL-E 3 is forced to
n=1). - Generated outputs are stored in Generations history and the first image is also placed onto the canvas.
2) Loading an existing image file onto the canvas (and reframing it)
Click Load Image to bring in an existing file (PNG/JPG/etc.). The app opens a specialized Image Loading/Resizing Options dialog.
In this dialog you can:
- Choose a target canvas size (1024x1024, 1536x1024, 1024x1536).
- Use a slider to interpolate between:
- Outfill / “fit inside” (image shrinks to fit, leaving empty transparent canvas)
- Crop / “cover” (image fills canvas, cropping edges)
- Drag the preview to position the image on the canvas.
- Optionally allow extra shrink/zoom ranges.
When you press OK, the app produces:
- a new RGBA canvas image at the chosen resolution
- a default user mask
- an outfill mask that marks where “new empty canvas” exists
This UI can also be used with “resize”, to edit the current canvas, resize the current contents or the dimensions.
3) Mask drawing interface (optional edit hinting)
On the right-side canvas:
- You can paint a mask using a brush size slider.
- The mask is displayed as a semi-transparent red overlay.
- Right-click on the canvas shows quick actions:
- Draw brush / Erase brush
- Reset mask
- Brush size presets
Important: in GPT-Image edits, the mask is treated as a hint for the vision model, not a strict “hard allowed region” enforcement.
4) Additional images (multi-image edits with references)
In the Edit tab, you can attach additional images. These are sent alongside the main canvas image as extra vision inputs for the model (depicted in screenshot above). They do not need size or aspect ratio conformance to the canvas.
Typical uses:
- “Add the logo from the second image to the shirt.”
- “Use the character from image 2, but place them into this scene.”
- “Use these product images to compose a gift basket (boring).”
The UI allows:
- Add…
- Remove Selected
- Clear
Pre-upload resizing rule: each additional image is downscaled so its largest side is at most 1024px, then converted to PNG for upload (gpt-image-1x vision is actually documented limited at 512px for the shorter side, and “input_fidelity” resolution unknown)
(An array of images is allowed, but only one prompt to describe what to do with each.)
5) History: Generations (outputs) and Requests (inputs)
The app keeps two separate histories for the current session:
- Generations: images returned by the API (generate/edit outputs)
- Requests: the input settings and (for edits) a snapshot of the canvas + mask state sent
From history you can:
- restore an output back to the canvas
- restore a past request’s settings (including edit prompt, drawn mask, and additional images)
- delete or clear entries, save multiple image generations
6) Saving and clipboard
- Save the current canvas image as PNG.
- On Windows, the canvas image can be copied to the clipboard with transparency preserved, or flattened.
Developer Notes: what gets transmitted (and what the user doesn’t directly see)
This section describes the hidden part: how the application packages and preprocesses image + mask data to maximize multimodal “vision understanding” during prompt-based edits. I tried to make “outfill” and “edits” much better understood so there is at least moderate mask-following.
A) What is transmitted on single-image edits
When you click Send Edit, the request includes:
image[](first image): the canvas PNG (RGBA)
- Always the current canvas at full resolution.
- If outfill exists and a mask is included, the canvas is preprocessed to make outfill visually legible (see below).
mask(optional): a separate RGBA PNG mask image
- Only included if:
- the user painted a mask (
mask_modified), or - the canvas includes outfill/empty regions (
loaded_outfill)
- the user painted a mask (
- This is not the legacy “alpha-only enforcement” mask; it is a vision-guided mask image optimized for model interpretation.
- Text fields / params
model,prompt, plussize,quality,background,ninput_fidelityonly when model isgpt-image-1
B) What is transmitted on multi-image edits
Everything above, plus:
- Additional
image[]parts appended after the canvas - The mask applies only to the first image (the canvas)
- Additional images can be different sizes (the app downsizes them to max side 1024 for speed/bandwidth)
Unique preprocessing beyond legacy DALL-E 2 “alpha = repaint allowed”
Legacy DALL-E 2 style edits treated alpha strictly as the programmatic editable region, nothing could be created or edited unless you drew a mask area. GPT-Image edits are vision-driven, so this app sends clearer visual cues.
1) Outfill transparency is made “visible” to the model (checkerboard RGB + alpha=0)
If a mask is being included and outfill exists, the app modifies the canvas image before upload:
- In outfill pixels (where the outfill mask indicates empty canvas):
- RGB is painted with a subtle checkerboard pattern
- alpha is forced to 0 (still truly transparent)
Why: fully transparent pixels carry no RGB information; the checkerboard is a strong conventional cue that tells the vision model “this is intentionally empty/outfill space”.
It still helps to describe: “fill to edges”, “expand background replacing transparent sides”, etc.
2) The mask is a separate “vision-guided mask image”, not just a binary alpha cutout
When included, the app constructs a dedicated RGBA mask image:
- RGB: sepia-tinted grayscale of the canvas (preserves structure, reduces color noise)
- Alpha:
- alpha = 0 where edits are intended (user mask or outfill)
- alpha = 255 elsewhere
- Extra cue for user-painted regions:
- user-drawn editable zones are filled with light gray in RGB
Why: it gives the model both:
- contextual scene information (in simplified grayscale)
- a clean “transparent means edit here” intent signal
This “mask as communication” approach is tuned for multimodal understanding rather than undelivered strict enforcement.
Requirements
- Python 3.10+ with typical desktop install (tkinter)
- Pillow:
pip install pillow(PIL image editing library) - httpx:
pip install httpx(you’d have this if you installedopenaimodule) - Funded ID-verified OpenAI account; set
OPENAI_API_KEYin OS environment variables, or enter it at startup when prompted. - (Tolerance for lots of vibe-coding that is not reusable)
Download latest revision: https://od.lk/d/MjRfNzEzNjMwNDNf/ai_image_editor_2026-01-06.py
Usage notes:
- A Python script started with the typical OS launcher will have a console window in addition to the UI Window. There, this program will print() both SDK parameter-style API request bodies (with elided “files”), and the “response” returned from the API, made pretty (or API logging to file without use of logging module).
- Rename the program extension to “.pyw” to avoid this additional console window.




















