I’m writing a multi-turn chat app with responses API. Besides the “human” chat, it also returns a rather large amount of information in a JSON schema. I saw that after a few dozen turns, each turn starts consuming over 1 million tokens because it’s referring to EVERY past response with all of their input and output tokens.
There is no way I can find to say “limit to the last 5 responses” or “limit to the last 10000 tokens”. This is a deal-breaker for the responses API. Going back to chat completions.