Your DALL-E problems now solved by GPT-4o multimodal image creation in ChatGPT?

We need a different topic to feature the discussions around this product and what it can do, and where it fulfills all your hopes and dreams Instead of a best-of gallery. So here it is.

I’ve got a lot of mental bookmarks of past topics posted here that highlighted challenges in getting the desired output from DALL-E 3 - where it was simply resistant to any attempts to prompt your way into the product needed.

I’ll have to translate those into forum searches, and will endeavor to use ChatGPTs release of GPT-4o’s ability to output images* (after nearly a year of a tease) to see if these cases are now all child’s play to the model itself making AI pictures.

What has been something that you could not do before?

(same prompt)


Prompt session

I just attached a picture of a semi truck carrying a tanker trailer, with a colorful stripe. I need a view of the same truck viewed directly from the side perfectly in a similar illustration. The count of all wheels for a typical trailer truck must be correct, and the illustration shows the point of view from an observer from directly to the side of the truck. There is no perspective technique used in the image. It is simply drawn architecturally but stylistically still.

Try 2: shows even more of the front despite contrary request.

The angle is almost right in the first prompt + existing image


Now a moon-moon


Original prompt with just a few alterations to emphasize the desired effect.


How about if we just want guys that aren’t chiseled-chin magazine models, and we also don’t want words of the prompt depicted randomly throughout the image?

Game night prompt

Image creation request for a photograph: it’s games night, and three gamer guys looking a bit unkempt from an obsessive gaming session are gathered around the table in a cramped and cluttered apartment at night. They are obsessing over a role play board game laid out on the table, that has game pieces and the game also involves player cards. We see the effect of a long engagement session by these “m’lady” out-of-shape guys: empty pizza box, soda cans, messy couch, and the darkened apartment also reveals curio cabinets loaded with other board game boxes to be played, but other figures and trinkets of an obsessed gaming fan. The lighting is by several lamps inadequately illuminating the room. You produce a full depiction of the setting and atmosphere.

(that also covers making more realistic people)

Conclusion

ChatGPT’s image creation now hits all the bases. Having portrait images intelligently employed for the subject (with no random rotation glitch) just scratches the surface of new capabilites and satisfied expectations.

8 Likes

cant seem to make images of basic chemical structures, more specifically a silicone crystalline structure which should be fairly simple to generate as its a repeating pattern, im not sure if its just the teething issues and spike in overall server compute with the influx of people using it that I’m experiencing but its currently stuck on generating.

GPT-4o seems to get lost and produces impossible illusions. Defining the external shape is good, but then becomes impossibly tangled.

So I must not relent!

Above, it looks like a visual jumble, but if you carefully follow the bonds, there seems to be no large fault in the occlusion or the count.

Let’s try for better presentation.

Latest prompt

You are in image creation mode.
Image: Generate a high-quality, accurate 3D diagram of a silicon crystal structure, suitable for a chemistry or materials science textbook. Show a 3×3×3 array of silicon unit cells arranged in a diamond cubic lattice. Atoms should be represented clearly, and tetrahedral covalent bonds should be accurately depicted. Use isometric perspective and shading appropriate for textbook clarity, not artistic flair. The diagram should suggest continuation of the structure (e.g. via ellipsis, fading, or clipping) beyond the 3×3×3 volume. The layout must be clean, with a neutral background. Ensure atomic positions and connections conform to known crystallographic data for silicon.

It is sufficient to say you cannot ask this to teach, you can use it to eventually get to something that you fact-check over and over, which I didn’t have the patience to pursue. “Cytosine”, everybody.

Molecule chain-of-thought

Absolutely — let’s carefully plan the 3D rendered textbook-style image of Cytosine, focusing on chemical accuracy and visual clarity using a molecular model style (like the classic ball-and-stick kits used in teaching labs).


:white_check_mark: Step 1: Representation Plan (Color and Size by Element)

We’ll use the standard CPK color scheme (commonly found in academic models and software) with proportionate ball sizes for realism and pedagogical clarity:

Element Color Relative Size Notes
Carbon (C) Black or Dark Gray Medium Backbone of the molecule
Hydrogen (H) White Small Often minimized but shown here
Nitrogen (N) Blue Medium Found in ring and functional groups
Oxygen (O) Red Medium Key to keto group
Bonds Gray rods Thin sticks Cylindrical, accurate angles

:white_check_mark: Step 2: Cytosine Structure Overview

Cytosine is a pyrimidine derivative, with the following structural characteristics:

  • Six-membered aromatic ring (pyrimidine), with:
    • Two nitrogen atoms at positions 1 and 3.
    • One keto group (=O) at position 2.
    • One amine group (–NH₂) at position 4.
    • One hydrogen attached to N1.

Molecular Formula: C₄H₅N₃O

Simplified Structural Layout:

     O
     ||
N1—C2—N3
|      |
C6     C4—NH2
|      |
C5====C5
  • Each ring carbon is bonded appropriately (sp² geometry).
  • The amine group at C4 is tetrahedral.
  • The keto group is double-bonded to C2.

:white_check_mark: Step 3: Rendering Objectives

  • Isometric or perspective view of the molecule.
  • Clear visibility of bond angles, planarity of the ring, and correct geometry for amine and keto groups.
  • Clean neutral background.
  • Slight shadowing or shading for 3D perception.

:dna: Now generating the 3D textbook-style Cytosine model

5 Likes

The DALL-E used to generate images that had a tall aspect ratio, but the scenes in the images always looked wide instead of tall. But now it is OK.

3 Likes

If you run into problems I suggest to take a look at the limitations outlined in the announcement post on the website. This provides some hints where additional work will be required to achieve a specific result.

https://openai.com/index/introducing-4o-image-generation/

1 Like

@gatekeeperonni

Same prompt before and now.
It could not create a train with no track or rail, but now it is OK:

old:

new:

3 Likes

oh yes, i do agreee, but it seems that is a frontier aspect of the dall-e project team by my guess, as that would not only be so valuable of a tool for many aspects of modern working life, inclusive of using it as a quick generative learning tool for teachers, however, there is benefit in being so close as to not give you exactly what you want so that you mind fills in the blank and that becomes a more embeded method of learning, as your brain would have had to work for that connection, this would be perhaps where the public version and a private for business version my need to differ. im curious if the agreement with microsoft would eventuate this new dall-e to co-pilot, would putting a version that more accurately depicts such things behind a paywall benifit humanity by restricting access and not allowing young people the ability to learn? that is an odd moral question that arises with this.

Until yesterday, it worked more than fine. But since this morning it has unfortunately deteriorated on various levels. The way images are uploaded: much slower. The colours are less intense, even when specifically asked for them. The gallery view has disappeared, so now each individual image has to separately been clicked on. And the system is much less flexible in coming up with new and innovative images/suggestions. I don’t know what has been done in the past 24 hours, but please undo it!

3 Likes

Thanks for starting the thread, @_j !

A few observations…

The wide size is a weird/odd aspect ratio now.

Getting close on polyhedral dice! Haha.






Comic panels and consistency means my kids book might be done finally haha


More consistency...



Looks like we’re back to PNG output too?

4 Likes

Not at all—3:2 is actually one of the most classic and widely used aspect ratios in photography. It originated with 35mm film and remains the standard for most DSLR and mirrorless cameras. So if your photo has a 3:2 ratio, it’s in excellent company.

Asking for such a diagram of aspect ratios, repeatedly, reveals something else just misunderstood or out-of-grasp, an AI without the ability to draw positions with accuracy.

CoT prompt as first message

Image:

Your task: make a camera aspect ratio image.

Let’s step back and deliberately plan how to depict aspect ratios or crop ratios within a clear and educational image. The goal is to visually communicate how different aspect ratios compare in shape and relative dimensions. Here’s a thoughtful, structured plan for how to do that:


:white_check_mark: Objective

Create a clean, educational diagram that visually compares the most common photography aspect ratios (e.g., 3:2, 4:3, 1:1, etc.) in a way that shows both their shape and proportional differences.


:brain: Conceptual Design Plan

1. Layout Structure

  • Use nested rectangles, all aligned to the same center (either top-left or dead-center alignment).
  • Start from largest ratio (widest) to smallest, so each smaller ratio fits inside or overlaps clearly.

2. Aspect Ratios to Include

  • 3:2 – Standard DSLR, 35mm film (e.g. 6000x4000 px)
  • 4:3 – Micro Four Thirds / compact cameras
  • 5:4 – Large format (e.g., 8x10 print)
  • 1:1 – Instagram, medium format
  • 16:9 – Widescreen video, some hybrid still/video formats

3. Colors and Styling

  • Assign each ratio a distinct color (blue, red, orange, green, purple)
  • Use outline-only rectangles with a white background
  • Add text labels in black, aligned inside or near each rectangle
  • Choose a modern sans-serif font for clean readability

4. Additional Design Choices

  • Optionally include a scale or baseline grid (e.g., faint gridlines)
  • Optionally include real-world examples near each label:
    • :camera: 3:2 – DSLR
    • :mobile_phone: 4:3 – iPad Camera
    • :framed_picture: 1:1 – Instagram
    • :television: 16:9 – YouTube
  • Title: “Common Photography Aspect Ratios”

:triangular_ruler: Diagram Arrangement Plan

Imagine a stack like this:

[ 3:2 ] — Blue outermost rectangle
[ 4:3 ] — Red inside that
[ 5:4 ] — Orange, slightly narrower
[ 16:9 ] — Purple, wider and shorter
[ 1:1 ] — Green square in the middle


:rocket: Next Step

Image: Now it is your turn to produce the high quality accurate aspect ratio depiction, suitable for instructional internet posting.


(do not use this AI picture)

1 Like

Touche… It was drag and drop at 16x9, but yeah. I do have to wonder the reasoning on it. Not a huge deal…

Still hard to mix styles too… Seeing lots of similar images/tones…

What will I do when everything is perfect… :smile:

I hope they are also working on the ‘security.’ We will see when ‘rose’ and ‘snow white’ are okay.
(I will check the mouthies and moons too…)

1 minute per image…
And system get slower for some days i found, and messages about heavy server load.

It will be interesting to see if enough consistency can be achieved for a story.

2 Likes

So the first question: I’m using a MyGPT to prevent GPT from changing my prompts and to get as exact translations as possible. Is the new model also accessible via MyGPTs? (So far, I haven’t seen any new options…)

I hope I can still see the prompts I’ve sent, otherwise learning will get even harder.

Is seed support included? Or is that one of the secrets behind how consistency is achieved? (In case someone already knows.)

The model is a lot better at instruction following. The workarounds to communicate with the prompt rewriting tool are not needed any longer.
You can use it by simply selecting 4o and choosing the create image option. No GPT needed.
It’s also possible to edit the previous version of the image right in the chat. No more working with an additional interface while leveraging conversation history to get the desired output.

It’s not perfect, but really good.

1 Like

3 posts were merged into an existing topic: New Imaging capabilities VERY curtailled

Currently custom GPTs don’t access new tool, they use still “dalle” tool.

I tried to prevent in GPT-4o and it worked.

3 Likes

Since “Black Panther” was blocked, I tried to bypass it simply by using a different word for “black” that has the same meaning. For example, “black” in another language.
In Spanish, “black” is negro, in Italian it’s nero, so just a simple color.
But…

“Negro Panther” photo style.
led to this… (in the original).
"Negro“ eine historisch belastete Bezeichnung.

Ammm… no…?! the pfui word is different, i will not right it here (but it is the name of a african country). This is just recently so, before you get simply a black panther.
So yes, the system is far from perfect. maybe in Spain this word is blocked, so could be that Spanish people can not generate anything black, but i not know.


In case you want search the trigger, use the prompt line by line, and tell GPT “do not change the prompt but send as it is”. you may find the trigger.

As a tip, names are always bad.

1 Like

GPT-4o in ChatGPT is no longer sending a prompt or writing a prompt. It outputs the image itself; you write the prompt. (That is, unless you still employ a GPT, a past conversation, or the AI invokes the dalle tool despite its built-in ability.)

(correction: it seems like this is still facilitated by a tool call for ChatGPT to make)

You can “thumbs-down” an actual over-refusal, even if by safety system beyond the ChatGPT generation itself, if not based on you trying to infringe upon another’s work.

2 Likes

Actually, every rights violation I’ve ever seen was always forced on me unwanted and unwelcome.

The thing is, I prefer writing prompts in my own language, but from past experiments, it turned out that only English is fully supported. I still need to find out whether that’s still the case. I might still need a MyGPT to help with translation… but I’ll figure that out next now.

(I’m currently almost done with a big test. Looks like… everything all over again… :smile:)

We definitely have photo realism now… i need now to find out how to make the fantasy style i always got with “photo style”.

1 Like