Assistant API: Run failed Rate limit reached

This error happened in the playground.

Run failed: Rate limit reached for gpt-4-1106-preview in organization org-wprBIxqhHg1PdAdC9cHksYnM on tokens_usage_based per min. Limit: 10000 / min. Please try again in 6ms. Visit https://platform.openai.com/account/rate-limits to learn more.

The MD file in the context is almost 1mb in size, so fairly big.

I was under the assumption that the MD file would get chuncked through openai’s own chunking methods which maybe it does, but I am wondering if it might not be the cause which is why I am getting some sort of rate limit.

Now thinking about it I am more confused since this model has over 100k context window, why would I have a 10k rate limit per minute?

Would love to know if this is a embed/chunking problem or a bug. If its a chunking problem then I’ll break down the MD doc into smaller ones I guess before upload.

I don’t seem to be able to find much in the documentation about how OpenAI going about chunking and embedding.

The big MD file is the problem. Ill still need to test by breaking the md file into smaller files, but this shouldn’t be the case if chunking is done right…

Hi there – we’ve just increased rate limits for all users.

You should be able to make at least a couple of large context size requests to GPT-4 Turbo now. As your usage of the API increases, you will move to higher Usage Tiers and you’ll be able to make more large requests to the model.

You can view the new rate limits for GPT-4-Turbo here: OpenAI Platform. You can also view rate limits for your specific account in your account settings here: OpenAI Platform

Really appreciate your patience here as we work through building capacity for the new models.

It seems like a wait and retry internally is needed regardless.

Along with a “max iterations” parameter and “max retrieval” and “max thread tokens”:

Documentation also states that context window will be maximized with retrieval, and the same with threads. Does this mean to expect 100k+ of retrieval into context, and then proceeding to iterate over more tools and functions while carrying that costly input along to the tune of $1 per 100k?