We currently have a production service used by customers and a development environment in which we run tests and experiments. Because rate limits are applied at the organization level, it is possible that a spike in our development tests and experiments could prevent the production service from operating for our users.
We are considering creating a separate organization so that production will be isolated from development with its own rate limit. However, this would create additional administration work and complexity: billing, configuring endpoints, generating API keys, managing users, etc. Also, this would cause us to appear as two separate organizations to OpenAI, each with a lower amount of usage, billing history, and potentially different early access to new features.
- Is there a recommended best practice for separating a production organization or sharing a single organization)?
- Is there any way to get the best of both worlds? Any OpenAI platform features for managing multiple isolated environments within the umbrella of a single organization? Any way to limit the rate limit of a development API key or prioritize the requests of a production API key?
Thanks for any advice!
At the moment, best practice is to use one account for all company operations and try to manage the R&D usage to a level that does not put the prod into jeopardy.
Unfortunately, this is all down to trying to manage the token usage load on the system as a whole, and that means that limits need to be managed on a per organisation basis. Have you tried applying for an rate limit increase exception? (at the bottom of the rate limit section of your account on the platform web page https://platform.openai.com/account/limits)
Thanks for the advice and confirmation of best practice, @Foxalabs! I think we have a sufficient rate limit for prod and development if we are careful.
I don’t like love the possibility that–if we are not sufficiently careful–development could affect prod. But hopefully it will be the case that prod traffic is much higher than the experiments that we run in development.
One idea that could cause us to create a new organization is if we find that we could get results from development experiments much faster if we blast them out in parallel at the maximum rate limit. The total amount of processing wouldn’t be huge but it would be very bursty. Perhaps development experiments could monitor the rate limit in order to stay under a threshold or make use of “extra” capacity that is unused by prod. But I’m speculating–for now this isn’t a huge problem. We’ll see how it goes and adjust or request an exception if needed.
The main issue as it stands is one of available compute, and OpenAI’s attempts to manage the demand with the hardware. Things should improve with developments like the H200’s and any new chips in the pipeline along with software optimisations and new methodologies, but for now we are all under a limit, and that is suboptimal for dev.