Hi everyone ![]()
We’re building IsoChron, a traffic-shaping middleware designed for AI and LLM-powered applications that face sudden traffic spikes and API rate limits.
Instead of rejecting requests when limits are hit, IsoChron:
- Detects burst traffic using entropy signals
- Queues and smooths requests dynamically
- Protects backends from overload and timeouts
- Helps reduce overprovisioning and cloud costs
We’re currently in the MVP + stress-testing phase, sharing both successes and failures openly while tuning the system for high-concurrency AI workloads.
We’d love feedback from developers who’ve dealt with:
429 Too Many Requests- Timeout storms during spikes
- LLM inference latency under load
Happy to answer questions, share test results, or learn from similar experiences.