I am noticing a strange effect, though others have noted it as well. ChatGPT results are consistently better than API (using the same model, whether it is gpt-4, or gpt-3.5-turbo). No other context, new conversation initiated on ChatGPT each time. ChatGPT was correct 10/10 times, whereas the API has the correct response (i.e. generates code that executes and produces a graph) 20% of the time. So far I’ve played with leaving all default options as-is, as well as fiddling with temperature (0 to 1), and other options. I’ve played with giving different system messages (or no message). Same results.
Pretty frustrating. Either ChatGPT is using a different model than the API, or they have system prompts that make all the difference (if so - tell us please, so API results can also be better).
Have you noticed a similar issue? What did you try to solve it?