O1 API vs ChatGPT.com: Why is the quality so different ? Seeking insights!

Hello,

I’m experiencing significant differences in quality for the same prompt when executed via chatgpt.com (with o1) versus through the API (still with o1).

The prompts are identical, and I allocated 30,000 tokens to the API for reasoning (max_completion_tokens). The result obtained via chatgpt.com is far superior.

On the API side, I’m getting results that are more or less similar to those I used to get with GPT-4-turbo.

I’m conducting a sectoral analysis based on local data (about 17,000 tokens for the initial prompt).

Do you have any suggestions or insights? Do you know if chatgpt.com performs any pre-processing that differs from a direct API call, or if other parameters could be influencing the results?

Thanks for your help!

2 Likes

Beyond the quality, the length of the completion provided via the API puzzles me: it’s barely a third of what I get from chatgpt.com

Knowing that the o1 beta doesn’t have any parameters (temperature, frequency, etc.), I don’t understand where the difference is coming from or how I can get closer to the completions provided by chatgpt.com.

Am I the only one here?

I’ve seen that some people had this issue with DALL-E, and a “revised_prompt” parameter helped to identify the differences, but this parameter doesn’t exist in the completion API :frowning:

2 Likes

I have the same issue. I allocated 32 000 tokens to o1-preview API, but the quality is so far below the quality I get from the o1-preview I have access as a Chat GPT plus user.

The answers are much shorter and don’t go as much in depth.

Hello @Naemy , do you also have the same issue comparing others models from API vS GPT subscription ?

I have the impression that this behaviour is the same whatever the model is.

I will proceed further tests today and post it here. I also have to test Playground VS API VS chatgpt sub…

Anybody would have some advices in order to align quality of results ?

Have a great day,

Pierre-Jean

Any

Hello,

I didn’t test other models extensively as I mainly got the API for o1-preview, but I got the impression the o1-mini answers were also shorter.

I use it mainly for programming, and the o1-preview API’s performance is underwhelming.
I tested several platforms, but it cannot perform the same way as o1-preview on OpenAI chatGPT even with 32 000 max tokens.

Let us know for the tests, I am interested.
Have a great day too.