I notice the inconsistent response between the api and the website version with the exact same prompt, both GPT3.5-turbo and GPT-4. I emailed the OpenAI team and was suggested to put this issue here. Maybe anyone has solutions? My conversation with the team is below:
My question is: Whether the api is calling the exact same model as the one used from the website? Or is the hyperparameter different? If so, what is the default in the website version and what is the default in api version.
Hello, We understand the discrepancies you’ve observed between the API responses and the website responses. We appreciate your feedback regarding our products. Rest assured, your observations have been documented and will undergo internal evaluation. While both the API and our website access the same underlying models, the potential discrepancies in output quality could be attributed to differing default hyperparameters or configurations set for each platform. The website’s defaults are typically designed for a generalized user experience, whereas the API offers more customizable parameters.
If so, can we developer have access to the parameter used in the website version. Because we often test the prompt on the website. It’s ok to not disclose your numbers, but at least give us a parameter to use the website version hyperparameter through api directly.
We understand the convenience of having consistent results between our website and the API, especially for developers who frequently test prompts on our platform. At the moment, specific hyperparameters used on the OpenAI website are optimized for general usage and may not always be disclosed due to various reasons. However, we recognize the value in your request. While we can’t share the exact parameters, we’re working on a feature to allow developers to easily replicate the website experience via the API.
If anyone has similar issues, please share with us.
That is a super interesting answer from OpenAI, thanks for the post, I’m assuming there might be a DevDay update (6th November) where perhaps a ChatGPT flag can be set in an API call to get it to act in a similar way, or some other mechanism.
At the moment you add a system prompt to define it’s persona, tell it when it’s knowledge cut of is and what todays date is and then set a temp of 0.7 to 1 and that gives a fairly typical ChatGPT like response.
You’re talking about the playground, not chatgpt, right?
If you’re talking about chatgpt, one solution would be to use a third party front end for the api, such as bettergpt. there, you might have more control over your parameters.
If you’re talking about the playground, I can’t confirm if you’re right or wrong, but I’ve heard that sometimes even with temperature 0, the models aren’t deterministic. I’m not sure if there’s anything we can (or should) do about this.
The last thing I’ll add is that we’ve entered a new era where the consistency of bits and bytes may no longer always be as relevant as it once was. I think we should learn to deal with this new paradigm instead of chasing the old one. but that’s just my opinion at this current point in time.
Sounds like they are making nonsense answers with AI.
Hyperparameters are specifically those for machine learning training when fine-tuning an AI model.
The better answer: “We are trialing different fine-tune and sub-architecture models to ChatGPT users than are currently available via the API. This can be demonstrated by different competencies in answering knowledge questions posed to GPT-4 about events around the cutoff date, even when extracting and reusing the same ChatGPT system prompt that advances that cutoff date. This is part of our efforts and goal of making our own ChatGPT product and those of our closest investors and partners more competent and feature-rich than any product you can develop yourself with our API.”
LLM model always outputs a distribution of the next token. It’s never deterministic. We simply choose the token with highest probability, so called greedy method. Temperature does not affect the result under greedy method. I assume chatgpt has some setting under the hood when you click the “Regenerate” button to provide different response.
Back to this topic, simply put, when I said API and website have different responses, I mean their quality has a significant difference (not the level of randomness can provide). Like you can observe the quality difference between 3.5 and 4. That’s why I raise the issue to the OpenAI team. I think the reason is the hyperparameter. OpenAI probably has a internal values of what temperature, top-p, top-k, to provide a good result for “general public”.
In summary, what I am asking the team is to give the developer the access to those setting so we can be consistent in the API result when we test the prompt through the website.
Hyperparameters are specifically those for machine learning training when fine-tuning an AI model.
The fault is mine. I use the term hyper parameter first. However, I have a question for you: What is the term for these settings: temperature, top-k, top-p, …? How do you call them?
But what is the system prompt of the website version?
I perfer to replicate the reply on website in my API call. Otherwise, it is kind of wasting time testing the prompt on the website. However, testing the prompt on the website is more easy because of the interface (also free).
API parameters are just “parameters”, in that they are being passed as parameters to a function or object definition, or “settings” if you want to make it simple (although the messages are also parameters, which are not exactly a setting).
hyperparameters are ML-specific language related to its learning rate algorithm — one of the parameters you can pass to a fine-tune job invocation (now only a single “epoch” setting is exposed).
{"role": "system", "content": "You are ChatGPT, a large language model trained by OpenAI. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: {CurrentDate}"},
to develop, you should probably use either the platform playground or a third party ui that uses the API with parameters you can control
ChatGPT is a completely different product that apparently uses models, versions, or finetunes that may not (perhaps ever) be available on the platform API for general usage.