Hello,
I’ve tested the Responses API with background: true to offload long-running, high-reasoning tasks (3–7 min) without blocking client memory but it disappointed me.
Jobs that via the Completions API took 3-5 minutes, under the background mode sit in a queue and only start after ~18 minutes. This is under the default service_tier (full price), not flex mode (lower price per token in exchange of delayed responses).
This queueing decreases so much background mode’s value in my opinion. Background jobs should start immediately by default, and only queue if using the lower-cost flex tier. We shouldn’t have to pay priority tier (which is only for enterprise customers) for this.
OpenAI, please consider this.
Thank you.