Severe Latency Spike / Timeouts with Flagship Models over the Last 3 Days (Looping/Chunked Architecture via PHP)

Hi everyone,

Over the last 72 hours, we have experienced a massive, sudden degradation in response generation speed specifically with flagship reasoning models (gpt-5.2 variants), making our production pipeline completely unstable.

Our Setup & Context:

  • Environment: PHP back-end handling high-volume automated script generation.

  • Architecture: Dual-phase setup. Phase A generates a JSON data structure framework. Phase B loops through that array in sequential, single-threaded batches (using array_chunk with 3 scenes per request) to write the complete content block.

  • Typical Token Payload: ~12k input tokens and ~4k output tokens across all split requests per job.

  • Enforced Settings: temperature is explicitly set to 1 as required for newer reasoning architecture, with a massive timeout limit on our server up to 900 seconds.

The Problem: Up until 3 days ago, this batch loop completed flawlessly and efficiently. Now, individual requests are processing so slowly that a batch of 5 scenes (roughly 1,000 words total payload) takes over 14 minutes, completely exhausting our extended 900-second server runway and throwing cURL error 28 timeouts. Even reducing the chunks to 2 scenes per request takes an abysmal 16 minutes to fully execute 7 consecutive round trips.

OpenAI Prompt Caching is visibly working on our dashboard, meaning our input token costs are practically zero and cached. However, the model’s internal reasoning step/token generation speed has dropped off a cliff.

What we have verified:

  1. the delay is 100% confined to the raw execution time of the flagship model processing output tokens.

  2. Our usage dashboards show normal token limits and no traditional rate-limiting headers or 429 error returns—just pure, extreme generation slowness.

Is anyone else using flagship models in batched loops experiencing an extreme throttle in reasoning/generation speeds over the last few days? Are there any known backend infrastructure maintenance cycles or hidden token generation limits affecting specific API clusters right now?

Any insight or confirmation from the OpenAI team would be greatly appreciated.

Thank you!