How to move up in Usage Tiers

Hi

My assistants response time is below 1s in the playground but when I send the same query over API my Netlify environment kills the connection after 26s (max timeout level). The API has been degrading slowly over time since I first tried it. I guess the reason is given in this post /t/assistants-api-too-slow-for-realtime-production/493627/2. How much do I need to spend to get a decent response time sustained? 10s would be ok-ish.

Thanks and regards

Welcome.

You can find more here…

### Usage tiers

You can view the rate and usage limits for your organization under the limits section of your account settings. As your usage of the OpenAI API and your spend on our API goes up, we automatically graduate you to the next usage tier. This usually results in an increase in rate limits across most models.

Tier Qualification Usage limits
Free User must be in an allowed geography $100 / month
Tier 1 $5 paid $100 / month
Tier 2 $50 paid and 7+ days since first successful payment $500 / month
Tier 3 $100 paid and 7+ days since first successful payment $1,000 / month
Tier 4 $250 paid and 14+ days since first successful payment $5,000 / month
Tier 5 $1,000 paid and 30+ days since first successful payment $10,000 / month

Select a tier below to view a high-level summary of rate limits per model.

https://platform.openai.com/docs/guides/rate-limits/usage-tiers?context=tier-free

Note, though, even at Tier 5 you might run into snags with the API (including low latency…) However, once you get to a certain point, you can likely contact sales, but there’s a lot of companies spending a lot of money right now, so you might have to work with the lower levels…

Hope this helps!

It is always the function call that causes pain. I just switched to GPT4 turbo but not getting better. I am a single user cause I am still developing the app thus I will not create huge amount of spending yet. But as long as I stay on Tier 1 I can’t onboard any users since the application is rendered useless. Users won’t forgive the timeouts. Can I use the Chat Completion instead of an Assistant and still get a response that can reliably trigger a function call?

Hey guys were you able to figure this out? I cannot find information in all docs about how to get access to 128k context window, I need it for adding a custom language reference (Verse programming, a DSL by Epic games for unreal) alongside my requests in the API, but currently capped at 8k tokens, I’m happily a paying customer, but I need to know at what point I’ll get access to 32k and 128k of gpt4 models (any of them really) otherwise it’s a random guess and I’ll throw 250$ into the balance (no problem with that of course, I will use it one day even if it’ll not work for my current idea about Verse coding) but will not get the ability to run the 32k context at least (as a minimum, will put the short version of language reference into it, might still work for some of my queries) it’s a waste :sweat_smile: could anyone point me to where we can see those actual access tiers? :pray: in case you figured it out

Hm. You only need to be in Tier 1 to get access to GPT-4-Turbo models which have the 128k context window. Which tier are you in currently?

I’m on 1, but these models are limited to 8k in my case in the Assistants api, that’s what I use. So I thought the token limits are in addition to the name of models, and the name doesn’t mean it’ll actually allow sending more to the input. In my case they don’t allow, errors out with “you sent more than N tokens”.

ChatOpenAI(model='gpt-4-0125-preview')
ChatOpenAI(model='gpt-4-1106-preview')

Not sure I fully understand.

Are you explicitly using gpt-4-turbo models for the Assistant’s API or the original GPT-4 model? GPT-4 or GPT-4-0613 are by default restricted to 8k context window. For GPT-4-turbo models you should have higher context window for Assistants. Personally I am not aware of a restriction in the access to GPT-4 turbo models under the Assistants API due to a lower tier (other than the free tier).

aha! so here’s what happens in the assistants api in playground:

image

but in the Chat it does not show me such error. I wonder if the one in “Chat” actually can consume all documentation of 50k tokens that I paste, will try now to see if it only reads part of it, or has access to all of it in one same request.

32k that it shows here is about characters, so it’s roughly the 8k tokens unfortunately, that’s why I’m trying to understand what is it and at which tier I can expect this to have a full context window. This error is for all gpt4 turbo model names, I tried all 3 (preview, 0125, 1106, all same error in assistants menu in playground). and the old gpt4 has a “per minute 10k tokens” so I can’t even feed a few requests to it, but that’s fine, it’s old and powerful and we’re not supposed to use it since it’s expensive on the hardware. The current per minute for ‘turbo’ is 300k for me, but this error about single context window size in request is what blocking me for now.

EDIT:
I think I found answer, so Assistants API specifically is limited despite the model does allow 128k Test new 128k window on gpt-4-1106-preview - #30 by nikunj

didn’t see anything about it in documentation but probably it’s buried somewhere :man_shrugging: good to know.

I get it now. Yeah, that’s a known limitation at this point. It has nothing to do with your tiers. See also discussion here:

1 Like