Handling OpenAI API Rate Limits Without Breaking User Experience

I’m building an app that occasionally hits OpenAI API rate limits during peak usage.
What are the best ways to queue requests, retry gracefully, or fall back without impacting the end user?
Would love to learn how others solved this in real-world applications.

Back-end job queuing with auto-retry, eg sidekiq