Seeking feedback on IsoChron’s approach to traffic spikes

Hi everyone :waving_hand:

We’re building IsoChron, a traffic-shaping middleware designed for AI and LLM-powered applications that face sudden traffic spikes and API rate limits.

Instead of rejecting requests when limits are hit, IsoChron:

  • Detects burst traffic using entropy signals
  • Queues and smooths requests dynamically
  • Protects backends from overload and timeouts
  • Helps reduce overprovisioning and cloud costs

We’re currently in the MVP + stress-testing phase, sharing both successes and failures openly while tuning the system for high-concurrency AI workloads.

We’d love feedback from developers who’ve dealt with:

  • 429 Too Many Requests
  • Timeout storms during spikes
  • LLM inference latency under load

Happy to answer questions, share test results, or learn from similar experiences.