Easy way to get a context window for a model

As far as I understand, the OpenAI API does not contain any method to get the size of the context window for a model used for a chat completion. It might cause some overload or underutilization of a model.

Definitely, some model names are clear enough (like gpt-4-32k) but the majority of them must be manually checked using the correspondent tables at https://platform.openai.com/docs/models.

It would be great to fix this inconsistency by any or all of these measures:

  1. Include the size of the context window in the name of each model.
  2. Improve the model object (https://platform.openai.com/docs/api-reference/models/object) by adding the size of context window.

Am I missing something? Are there any existing methods?

2 Likes

Yes, some have made their own existing methods…

def init_price_classes():
    self.price_classes = {
        # language gpt-3.5, 4 (limits are base)
        "gpt4": {"price_in": 0.03, "price_out": 0.06, "price_train": -1, "limit_ktpm":10, "limit_rpm":"200"},
        "gpt4-32": {"price_in": 0.06, "price_out": 0.12, "price_train": -1, "limit_ktpm":10, "limit_rpm":"200"},
        "turbo": {"price_in": 0.0015, "price_out": 0.002, "price_train": 0.0080, "limit_ktpm":90, "limit_rpm":"3500"},
        "instruct": {"price_in": 0.0015, "price_out": 0.002, "price_train": 0.0080, "limit_ktpm":250, "limit_rpm":"3000"},
        "ft-turbo": {"price_in": 0.012, "price_out": 0.016, "price_train": 0.0080, "limit_ktpm":90, "limit_rpm":"3500"},
        "turbo-16": {"price_in": 0.003, "price_out": 0.004, "price_train": -1, "limit_ktpm":180, "limit_rpm":"3500"},...

def init_model_list():
    self.model_list = {
        'ft:gpt-3.5-turbo':         {'price_class': 'ft-turbo',   'endpoint': 'chat',  'tokenizer': 'cl100k_base',  'context': 4097, 'cutoff': '2021-09', 'retire_date': '', 'tune': 'tune'},
        'gpt-3.5-turbo-0301':       {'price_class': 'turbo',      'endpoint': 'chat',  'tokenizer': 'cl100k_base',  'context': 4097, 'cutoff': '2021-09', 'retire_date': '2024-06-13'},
        'gpt-3.5-turbo-0613':       {'price_class': 'turbo',      'endpoint': 'chatf', 'tokenizer': 'cl100k_base',  'context': 4097, 'cutoff': '2021-09', 'retire_date': ''},
        'gpt-3.5-turbo-16k':        {'price_class': 'turbo-16',   'endpoint': 'chatf', 'tokenizer': 'cl100k_base', 'context': 16385, 'cutoff': '2021-09', 'retire_date': ''},

It would be cake for OpenAI to add this to the models endpoint, but the last time they messed with that, they removed metadata…

2 Likes

Thank you, @_j. Unfortunately, it is the same manual method I am trying to evade…

+1 for adding an API for programmatically discovering model properties. This could be an entirely separate metadata API instead of embedding in other API responses and could also distinguish between model properties (e.g. context window size, training data cut-off) and price info.

1 Like

Easiest way to get a context window length for a model?

The hard way:

from openai import OpenAI
import re

example_error_msg = ("This model's maximum context length is 128000 tokens. However, "
  "your messages resulted in 262151 tokens. Please reduce the length of the messages.")

def get_context_error(err_msg):
    """Search for the integer value after 'maximum context length is"""
    match = re.search(r'maximum context length is (\d+)', err_msg)
    if match:
        max_context_length = int(match.group(1))
        if 1000 <= max_context_length <= 200000:
            return max_context_length
        else:
            raise ValueError("extracted context length beyond (1000-200000).")
    else:
        raise ValueError("No value found matching context length.")

def get_context_len(modelparam="gpt-3.5-turbo"):
    """Probe for OpenAI chat completion model context length limit;
    API request with input and max_tokens bigger than any model"""
    cl = OpenAI(timeout=30)
    bigdata = "!@" * 2**17  # 256k
    try:
        response = cl.chat.completions.create(
            model=modelparam, max_tokens=265000,top_p=0.01,
            messages=[{"role": "system", "content": bigdata}]
        )
        raise ValueError(f"Context len: $$NO ERROR!$$:\n"
                         f"{response.choices[0].message.content}")
    except Exception as e:
        err = e
        # print(f"Error: {err}")
        if err.code == 'context_length_exceeded':
            return get_context_error(err.body['message'])
        else:
            raise ValueError(err)


if __name__ == "__main__":
    model = "gpt-4-1106-preview"  # just chat completion models
    context_len = get_context_len(model)  # use model
    print(f"{model} context length: {context_len} tokens.")
1 Like

It would be nice to have some endpoints to get this metadata, i.e., all the details about a model.