The varying versions of DALLE3 = different results for same prompt

So i’ve discovered that there are multiple different versions of DALLE3 - some are greatly trained at getting results like the BING version (which more advanced) and a friend just showed me some results from her iPhone app version of DALLE3 and I’m blown away.

From personal experience for the images I do, it seems all this time I have been using the crappiest one that being the ChatGPT4 version! lol So if you struggle with DALLE3 on the browser ChatGPT version, just try one of the other more trained versions and you may get better results with least tweaking and using up your daily allowance.

1 Like

Fairly certain it’s all the same model, but different prompts and other things can affect the quality.

Is there a source for there being multiple DALLE3 models, or are you guessing from observations?


This is what ChatGPT told me after i moaned about different results with the same prompt across different apps. Don’t shoot the messenger! lol

Ah, okay, so it was the LLM hallucinating.

As I said, I’m fairly certain there’s only one version of DALLE3. We had a DALLE2.experimental over the summer, but in production there’s usually one main model. That said, different implementations may have different moderation levels, etc.

Sure. It may well be the same version in place, but perhaps each has been set with its own temperature and parameter settings set a different way… i.e one version may give more random results (i.e ChatGPT 4) whilst others are more finely-tuned with prompts.

1 Like

Prompt input may have an effect of temperature settings as a conversation evolves, but don’t quote me on that as I don’t have evidence for that. I have gathered through normal text conversations that temperature can become more…dynamic so to speak as a conversation evolves, things are refined, etc. in ChatGPT

Did you recite the exact same prompt verbatim from your friend’s iphone version? Remember too even simple language adjustments seem to have some interesting and unexpected shifts in results of the output imagery. Also, any language with more abstract language and lexicon will produce much more variance in the results in general.


Copy and paste jobby as obvously changing the prompt anyway would produce a complete different result. My experience with doing this test between ChatGPT and Bing, the results were like night an day. Bing just don’t look fake and I’ve gotten some really great photorealistic images. were chatgpt with the same prompt struggled first time request to give me something that didn’t look fake or even managed to render an image of the situation that looked like it just wasn’t trained to certain requests. Why can Bing produce it actually and photo realistically but chatgpt can not. that is the question! Good luck!

Right, but even if you used the same prompt, if there’s abstract language (like emotions), then it’s going to produce wildly different results. If you provide the prompt we could more easily assess what may cause changes and variance, but keep in mind, there’s always gonna be some fuzziness involved.

Also, I would not compare ChatGPT 's tools and Bing’s tools together. They may be based off the same models, but what each company does with them is entirely different, including the management of them. We don’t know if Bing has the resources to more iteratively train or find tune their models, and they also get a much higher traffic count in comparison, so differences between them should be expected and assumed. I can help focus on closing that gap between OpenAI’s apps, but Bing should be treated as a completely separate entity.


Sure… and agree maybe they are just managed differently. And that’s ok. Its no biggie, just a personal observation. I am using Bing for the type of images I know I can rely on it to get right. Whilst I will use ChatGPT for the ‘easier stuff’ it can manage, being a BETA office junior that it is! I’m beginning to I know its limitations and strengths, and will use it wisely instead of getting frustrated when it uses up my daily allowance lol

Because the end-point for the API is the same (the infamous image_creator), but Bing uses a different database of knowledge and different rules to block (or not).

Also ChatGPT always “airbrushes” the results (of faces, e.g.) to prevent that you can use them as fake news.

Bing is less restrictive and is not “smoothing” skins and faces.

But it’s for 99% the prompt that makes the result. Query the JSON data and see if all prompts were exactly the same or altered (because both Bing and ChatGPT will alter, if you let them).

1 Like

For developers; Bing and ChatGPT are both C#, but the kind of software they develop with it are complete different (it’s phyton, actually).

I mean, “they both speak English but do write totally different books”.

Also Bing uses far more seeds, where ChatGPT is very restricted since about a week.

1 Like