Disparity between o1-preview results in ChatGPT vs API

Hi team,

I am building an app with the o1-preview model and I am seeing significant differences in quality of outputs on the API vs ChatGPT for the same prompts. ChatGPT nails it each time, however the API does not. Could you please help point me in the right direction? Perhaps my settings are incorrect? I have set the temperature to 1 as it will not take any other value. Is there anything else I can try?

I also need some guidance on implementing rate limits correctly. I am on Tier 3. I have noticed that it times out on me after multiple requests but I do not see a timeout error from my app having implemented timeout exception handling, so I am wondering if I have implemented rate limiting correctly.