Quota management for openai models

Is there a way to programmatically check the cost of using the models. And then if the cost goes over a threshold, suspend the API key

Thanks

Yes, you can set a limit on how many tokens are used. You can find more information here.

Hi,

Sure, you can track tokens sent and received with Tiktoken and then report those back from your API relay server, your end user application should authenticate with the API relay to uniquely identify each client.

OAuth or similar is suitable for this and that way you can build a usage database of each transaction and the model used, tokens sent, received and then run a billing cycle from that data on your preferred schedule.

Sorry, I probably did not ask the question correctly. Currently I can go to my account page and see how much I have spent so far. I was looking for a programatic way to get that information instead of logging in to my account and see. I am trying to create a resource monitor which will run every 5 minutes(for example) to see if any threshold has been breached.

There is currently no official billing API, you would need to use the website dashboard for that.

https://help.openai.com/en/articles/6614209-how-do-i-check-my-token-usage

1 Like

If you are logging the inputs and outputs, you can also measure the tokens your usage is consuming, and calculate for yourself.


I have a library I’ve been plugging away at with the goal of returning any model metadata one could desire. Besides servicing queries like the token cost of your database logging or the context length of selected models, one could use it as a AI function so an AI could answer, or a website backend. As part of scope creep, it is now part of a library openaiutils (so that it can never get done). Some of the tables (just snippets):

def init_price_classes():
    self.price_classes = {
        # language gpt-3.5, 4 (limits are base)
        "gpt4": {"price_in": 0.03, "price_out": 0.06, "price_train": -1, "limit_ktpm":10, "limit_rpm":"200"},
        "gpt4-32": {"price_in": 0.06, "price_out": 0.12, "price_train": -1, "limit_ktpm":10, "limit_rpm":"200"},
        "turbo": {"price_in": 0.0015, "price_out": 0.002, "price_train": 0.0080, "limit_ktpm":90, "limit_rpm":"3500"},
        "instruct": {"price_in": 0.0015, "price_out": 0.002, "price_train": 0.0080, "limit_ktpm":250, "limit_rpm":"3000"},
        "ft-turbo": {"price_in": 0.012, "price_out": 0.016, "price_train": 0.0080, "limit_ktpm":90, "limit_rpm":"3500"},

def init_model_list():
    self.model_list = {
        'gpt-3.5-turbo-16k':        {'price_class': 'turbo-16',   'endpoint': 'chatf', 'tokenizer': 'cl100k_base', 'context': 16385, 'cutoff': '2021-09', 'retire_date': ''},
        'gpt-3.5-turbo-16k-0613':   {'price_class': 'turbo-16',   'endpoint': 'chatf', 'tokenizer': 'cl100k_base', 'context': 16385, 'cutoff': '2021-09', 'retire_date': ''},
        'gpt-3.5-turbo-instruct':   {'price_class': 'instruct',      'endpoint': 'complete', 'tokenizer': 'cl100k_base', 'context': 4097, 'cutoff': '2021-09', 'retire_date': ''},
        'gpt-3.5-turbo-instruct-0914': {'price_class': 'instruct',   'endpoint': 'complete', 'tokenizer': 'cl100k_base', 'context': 4097, 'cutoff': '2021-09', 'retire_date': ''},
        'gpt-4':                    {'price_class': 'gpt4',       'endpoint': 'chatf', 'tokenizer': 'cl100k_base',  'context': 4097, 'cutoff': '2021-09', 'retire_date': ''},
        'gpt-4-0314':               {'price_class': 'gpt4',       'endpoint': 'chat',  'tokenizer': 'cl100k_base',  'context': 4097, 'cutoff': '2021-09', 'retire_date': '2024-06-13'},
        'gpt-4-0613':               {'price_class': 'gpt4',       'endpoint': 'chatf', 'tokenizer': 'cl100k_base',  'context': 4097, 'cutoff': '2021-09', 'retire_date': ''},


   def init_urls():
        self.endpoint_base = "https://api.openai.com"
        self.endpoint_url = {
            "chat": "/v1/chat/completions",
            "chatf": "/v1/chat/completions",  # model supports functions
            "tune": "/v1/fine_tuning/jobs",
            "oldtune": "/v1/fine-tunes",
            "embed": "/v1/embeddings",
            "complete": "/v1/completions",
            "mod": "/v1/moderations",
...
    self.embed_alias = [
        {'ada-search-document': 'text-search-ada-doc-001', 'reported': 'text-search-ada:001', 'dim': -0.03518218919634819},
        {'curie-search-document': 'text-search-curie-doc-001', 'reported': 'text-search-curie:001', 'dim': -0.02822701632976532},
        {'curie-search-query': 'text-search-curie-query-001', 'reported': 'text-search-curie:001', 'dim': -0.027747787535190582},
        {'ada-similarity': 'text-similarity-ada-001', 'reported': 'text-similarity-ada:001', 'dim': -0.01897580549120903},
        {'curie-similarity': 'text-similarity-curie-001', 'reported': 'text-similarity-curie:001', 'dim': -0.010900501161813736},
        {'davinci-search-document': 'text-search-davinci-doc-001', 'reported': 'text-search-davinci:001', 'dim': -0.00792695488780737},

def init_deprecation_replacements():
    self.upgrade_auto = {
        'ada': 'babbage-002',
        'babbage': 'babbage-002',
        'curie': 'davinci-002',
        'davinci': 'davinci-002',
        'gpt-3.5-turbo-0301' : 'gpt-3.5-turbo',

    self.stable_alias = [
        {'gpt-3.5-turbo': 'gpt-3.5-turbo-0613', 'reported': 'gpt-3.5-turbo-0613'},
        {'gpt-3.5-turbo-16k': 'gpt-3.5-turbo-16k-0613', 'reported': 'gpt-3.5-turbo-16k-0613'},
        {'gpt-4': 'gpt-4-0613', 'reported': 'gpt-4-0613'},
        {'gpt-4-32k': 'gpt-4-32k-0613', 'reported': 'gpt-4-32k-0613'},
2 Likes

this is great, do you intend to make this library an open source one.

Thanks