Hello,
I’m encountering differing outputs when using the same prompt with GPT-4o in both ChatGPT’s temporary chat mode and the OpenAI Playground.
Has anyone else experienced this issue? Any suggestions to achieve consistent results across both platforms would be appreciated.
Thank you.
3 Likes
What are your parameters? We don’t know what temperature etc. ChatGPT is using, however they do matter a lot when it comes to outputting the same/similar responses.
ChatGPT could also be on a completely different seed or even model version (gpt-4o-chatgpt-private or something along those lines), making the above obsolete as well.
Cheers! 
Thank you for your reply!
I’ve actually experimented with different combinations of temperature and top-p settings in the Playground, but I still see inconsistencies compared to ChatGPT’s temporary chat mode.
I also noticed that the Playground offers the gpt-4o-latest model, but switching to it didn’t yield more consistent or relevant results. Do you know if this version differs significantly from what ChatGPT is using?
Would love to hear your insights!
Cheers! 
Well, we won’t really know until OpenAI says for sure, however I do believe that they are using a different model.
If not, then they will for sure be on a different seed than the models available via API.
They also have moderation on their model which might change output.
Sadly, we don’t know how enough like the system prompt, whether or not they change the prompts we input and much more. Even a single character difference in the system prompt can lead to completly different outputs.
I don’t think the models from the API differ significantly from the models on ChatGPT though - at least not in their capabilities, I think the API-Models are more capable due to little to no moderation or prompt manipulation happening compared to ChatGPT.
Are you trying to achieve something or were you just curious about the potential difference in models?
Maybe I can help otherwise. 
1 Like
Thank you for your insights.
I have ensured that both ChatGPT and the OpenAI Playground are utilizing the same GPT-4o model. However, due to the lack of explicit information from OpenAI regarding ChatGPT’s default parameters—such as temperature, top-p, and max tokens—I’ve left these settings at their defaults in the Playground.
I understand that system prompts and moderation filters can influence the model’s responses. However, even with identical prompts and settings, the discrepancies persist.
My goal is to achieve consistent outputs between ChatGPT and the Playground for my project. If you have any further suggestions or insights on how to address this issue, I would greatly appreciate your assistance.
Thank you.
Hi @nicolas.i and welcome to the community!
Do you have memory enabled in ChatGPT? This also seems to matter a lot, since it’s passed into the context behind the scenes, on every chat session.
1 Like
Thank you for your question.
I am using ChatGPT’s temporary chat mode, where memory is disabled. Despite this, I notice discrepancies between responses from ChatGPT and the OpenAI Playground, even with identical prompts and default settings.
2 Likes