I am trying to use gpt-5-pro in my existing setup, which is working great currently using ChatCompletions endpoint. But API fails saying “gpt-5-pro” cannot use ChatCompletions and must use Responses.
So I tried to implement Responses endpoint for the first time, but it fails because I am using User/Assistant pairs for “few-shot” examples.
The few shot examples have been critical to getting the output I need from ChatCompletions. Without them, the output I get from the API is basically not usable. It’s a way for me to train OpenAI response on what kind of output I am expectin, in a way that the prompt a lone can’t seem to do.
I tried stuffing few-shot examples into the generael prompt and just sending Responses one giant prompt, but the output is bad. It’s not even close to as good as using the older models with ChatCompletions + User/Assistant paired few-shot examples.
Am I using Responses incorrectly, or is this just some limitation in Responses API that we can’t really take advantage of few shot examples anymore? Perhaps I need a completely new structure to my prompt or something, or maybe there is some way to use few-shot examples that I am just missing?
GPT-5 behaves differently. Especially Pro, where you are getting high amounts of internal thinking about a task.
It has internal reasoning, forming its own ideas, and will not ‘pattern learn’ by example easily. Also, GPT-5 series is just really poor about figuring out intentions and comprehending instructions. It is a chat model and will produce problem solutions - but often problems it created for itself, where it then even loses attention to who’s talking. It is in a constant state of distrust of the authority of the user and in believing the assistant is itself.
Best tip: large developer message. The AI seems to have been trained on large input and tools, and doesn’t act well on user input until thousands of tokens in.
Conversational multi-shot must be turns with encrypted reasoning preserved, or you will be degrading the AI by showing the opposite of what it does in producing to “channels”. Thus, use language turns for input actually generated by the model and performing successfully, as a warm-up to the subject area.
You can replicate your Chat Completions turns that you were providing on a model like gpt-4.1 with low temperature, to ensure you got the messages themselves correct.