Assistant API: Run failed Rate limit reached

bkerryk · November 8, 2023, 11:20pm

This error happened in the playground.

Run failed: Rate limit reached for gpt-4-1106-preview in organization org-wprBIxqhHg1PdAdC9cHksYnM on tokens_usage_based per min. Limit: 10000 / min. Please try again in 6ms. Visit https://platform.openai.com/account/rate-limits to learn more.

The MD file in the context is almost 1mb in size, so fairly big.

I was under the assumption that the MD file would get chuncked through openai’s own chunking methods which maybe it does, but I am wondering if it might not be the cause which is why I am getting some sort of rate limit.

Now thinking about it I am more confused since this model has over 100k context window, why would I have a 10k rate limit per minute?

Would love to know if this is a embed/chunking problem or a bug. If its a chunking problem then I’ll break down the MD doc into smaller ones I guess before upload.

I don’t seem to be able to find much in the documentation about how OpenAI going about chunking and embedding.

bkerryk · November 9, 2023, 12:10am

The big MD file is the problem. Ill still need to test by breaking the md file into smaller files, but this shouldn’t be the case if chunking is done right…

nikunj · November 9, 2023, 11:38pm

Hi there – we’ve just increased rate limits for all users.

You should be able to make at least a couple of large context size requests to GPT-4 Turbo now. As your usage of the API increases, you will move to higher Usage Tiers and you’ll be able to make more large requests to the model.

You can view the new rate limits for GPT-4-Turbo here: OpenAI Platform. You can also view rate limits for your specific account in your account settings here: OpenAI Platform

Really appreciate your patience here as we work through building capacity for the new models.

_j · November 9, 2023, 11:45pm

It seems like a wait and retry internally is needed regardless.

Along with a “max iterations” parameter and “max retrieval” and “max thread tokens”:

Documentation also states that context window will be maximized with retrieval, and the same with threads. Does this mean to expect 100k+ of retrieval into context, and then proceeding to iterate over more tools and functions while carrying that costly input along to the tune of $1 per 100k?

Topic		Replies	Views
Test new 128k window on gpt-4-1106-preview API	29	18402	February 6, 2024
Gpt-4-1106-preview in Playground needs some fixes API gpt-4 , playground	24	17179	February 5, 2024
Rate limiting but I've run nothing... and I'm getting charged - what's up? API assistants-api	3	1318	February 8, 2024
Assistants API - Too many requests API gpt-4 , api , assistants	9	1942	November 9, 2023
Request too large for gpt-4o in organization Bugs	3	7466	October 9, 2024

Assistant API: Run failed Rate limit reached

Related topics