chatgpt web version vs gpt-3.5-turbo api

I have a text classification task that I have been exploring with chatgpt (the web version available here https://chat.openai.com/chat) with reasonable success. When I try to replicate my results using gpt-3.5-turbo, the classification prediction is incorrect more often. I understand that there is some inherent stochasticity at play here that can cause individual results to differ.

What I’m looking for here is best practice recommendations that I can follow to close the discrepancy between the two models as much as possible. For example, one thing I would like to do is make sure the underlying tunable parameters (temperature, top_p, etc.) are the same. Is it known what values of those parameters chatgpt uses? Does anyone have any other insight/advice? Thanks!

1 Like

What’s your system prompt look like? And your user/assistant prompts? What settings are you using?

We don’t know for sure the settings for ChatGPT - or the system message used - which can make it difficult to replicate exactly, but if ChatGPT can handle it, there’s a good chance you can do it with the API too with a little work.