Model API for auto-coding for AI programming web interface

_j · February 13, 2024, 9:42pm

Not 100 in parallel?

Consider the limit to be continuous instead of discrete. I would use some variation of calculating the per-millisecond rate from your token and requests per minute organization limits. And then keep some record of the “expense” of each API call to queue them and adjust the depth of a parallel handler.

The difficulty is compounded in that the API limiter is based on input, but the current limit is based on past completed calls, so there is a delay in effect. And that you may have async parallel calls with a batch job or on demand yet to finish. And that some tier per-minute limits can be close to just allowing one maximum context calls per minute.

The headers will return various rate limit stats. For a particular model (or model class that shares a limit), you could start to back off further when the “remaining” is low, instead of strictly controlling by measuring your input.

x-ratelimit-limit-requests: 300
x-ratelimit-limit-tokens: 150000
x-ratelimit-remaining-requests: 299
x-ratelimit-remaining-tokens: 149605
x-ratelimit-reset-requests: 200ms
x-ratelimit-reset-tokens: 157ms

Just a simple loop of affordable API calls couldn’t make a dent in the “remaining” for me because of the fast reset and slow generation. (Several “500 server” errors though across 3.5 and 4, about 15%.)

Topic		Replies	Views
Extract code from Text gpt response API	16	18281	February 5, 2024
OpenAI returns code with few parts commented API gpt-4	3	571	July 27, 2023
GPT not adding spacing when coding Bugs gpt-4	13	955	August 19, 2024
Linebreaks are removed since the new Playground Bugs playground	29	1546	October 5, 2024
ChatGPT Plus writes code and stops near to the end - Node.js API	11	1653	February 14, 2024

Model API for auto-coding for AI programming web interface

Related topics