Request for context_total parameter under usage

The OpenAI API should provide a system parameter to set the running context window of the client. This makes sense given the dynamic behavior across LLMs which use the same OpenAI API, but have different context window limits. Please create a context_total parameter under usage. This parameter would dynamically change if say a switch between gpt-3.5 and gpt-4.

I’m really confused, can’t you just set whatever context limit you want by simply not passing in more context?

Yes, but this behavior should be dynamic based on the context size the inference server has set for a specific LLM. This allows the clients to not have to guess (or maintain lookup context size data for each LLM) what the inference server is doing for each LLM its rendering. The context size is different across LLMs. This is an argument for clients that are designed to work with many OpenAI compatible LLMs.

the pricing is also different across different models, so is the behavior, max attention, etc, etc. :person_shrugging:

I don’t disagree that it wouldn’t be nice to have, but I can see why they don’t. they didn’t even have usage until recently with streaming, so maybe they’ll get to it eventually.

So, your argument is you want OpenAI to truncate your context for you?

While I’m sure they could I don’t know how many developers would want such a thing. Personally, I wouldn’t want to give up control of managing the context myself because there much better context management solutions than truncation.

No, I want to get a parameter from the server to set the context size dynamically for the LLM. This will allow smooth transitions for the client when switching between LLMs. The price is irrelevant.

for you, at the moment :thinking:

so to clarify: you just want an informational endpoint that delivers the model’s max context window, correct?

and ideally, for your case, you’d want the info to be delivered in usage, but that’s more of an implementation detail. Did we get this right?

1 Like

Yes, its doesn’t matter where it is. I was just making usage a suggestion given its relevancy for tokens. It would be a usage parameter for the client.

Yeah people have been requesting that stuff for a while now. You just have to maintain the table yourself for now unfortunately.

don’t know if it’s best practice but I’ve had this for a while.


omni complicates this even more by requiring a different tokenizer, so I understand the pain.

I think the more appropriate place for this information would be in,{model}


This has been discussed here before.

Indeed, its so exhausting lol

The inference server has this information. I don’t know why it can’t just relay it.

That’s not the model names?

probably lack of developers and perpetual crunch tbh - and no one at OAI really cares about devs atm, so there’s that too :laughing:

Well thank you for responding.

The confusion is you’ve used the word “parameter”, while a parameter is for what you’d be specifying and sending.

Instead, you want to receive information, information before you make a AI model call.

Essentially, this:

1 Like

Well its a parameter from the perspective of the inference server. Your model_metadata.json is the exact thing I’m trying to avoid, also it only lists OpenAI LLMs, but thank you for responding and giving me a feel where this problem is. The context length meta-data should be a standard API parameter within the OpenAI API specs on the server side, given this API standard is widely used across many inference servers and organizations.