I’m experiencing significant differences in quality for the same prompt when executed via chatgpt.com (with o1) versus through the API (still with o1).
The prompts are identical, and I allocated 30,000 tokens to the API for reasoning (max_completion_tokens). The result obtained via chatgpt.com is far superior.
On the API side, I’m getting results that are more or less similar to those I used to get with GPT-4-turbo.
I’m conducting a sectoral analysis based on local data (about 17,000 tokens for the initial prompt).
Do you have any suggestions or insights? Do you know if chatgpt.com performs any pre-processing that differs from a direct API call, or if other parameters could be influencing the results?
Beyond the quality, the length of the completion provided via the API puzzles me: it’s barely a third of what I get from chatgpt.com…
Knowing that the o1 beta doesn’t have any parameters (temperature, frequency, etc.), I don’t understand where the difference is coming from or how I can get closer to the completions provided by chatgpt.com.
Am I the only one here?
I’ve seen that some people had this issue with DALL-E, and a “revised_prompt” parameter helped to identify the differences, but this parameter doesn’t exist in the completion API
I have the same issue. I allocated 32 000 tokens to o1-preview API, but the quality is so far below the quality I get from the o1-preview I have access as a Chat GPT plus user.
The answers are much shorter and don’t go as much in depth.
I didn’t test other models extensively as I mainly got the API for o1-preview, but I got the impression the o1-mini answers were also shorter.
I use it mainly for programming, and the o1-preview API’s performance is underwhelming.
I tested several platforms, but it cannot perform the same way as o1-preview on OpenAI chatGPT even with 32 000 max tokens.
Let us know for the tests, I am interested.
Have a great day too.