API models endpoint with model features, pricing, context length? (yes - with this Python code)

I’ve posted to github-q&a
similar to client.models.list(), is there any way to list available/valid models at respective capability level like client.chat.models.list() or client.embeddings.models.list() ?

if we pass model="text-embedding-ada-002" not suitable for chat.completion.create we get an error:

NotFoundError: Error code: 404 - {'error': {'message': 'This is not a chat model and thus not supported in the v1/chat/completions endpoint. Did you mean to use v1/completions?', 'type': 'invalid_request_error', 'param': 'model', 'code': None}}

That is going to need some additional metadata to be returned from a self-curated version of the API. Let’s see what I can whip up for you…

First, we’ll need some extensive data about all the models that has been rigorously verified.

Model metadata , 2024-05-22

(expand) modelmeta.json
{
  "name": "model_metadata",
  "description": "OpenAI API model pricing, context length, maximum outputs, endpoint mapping, stable alias mapping, tier rate limits, features, etc (unofficial)",
  "notes": "Tier limits can be account-specific and changing without announcement - verify with headers. Unverified limits are +1. Some shutoff dates are 'at the earliest'. max_tokens_max are practical settings.",
  "version": "2024-05-22",
  "schema_version" : "0.2",
  "author": "_j",
  "data":
    [
    {"id": "gpt-3.5-turbo-0125", "context": 16385, "max_tokens_max": 4096, "price_in": 0.50, "price_out": 1.50, "price_train": 8.00,"ft_price_in": 3.00, "ft_price_out": 6.00, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "tools", "assistants", "retrieval", "ft_assistants", "logprobs", "logit_bias", "ft"]},
    {"id": "gpt-3.5-turbo-0301", "context": 4096, "max_tokens_max": 4088, "price_in": 1.50, "price_out": 2.00, "units": "Mtokens", "shutoff": "2024-06-13", "replacement": "gpt-3.5-turbo-1106", "endpoints": ["/v1/chat/completions"], "features": ["chat", "logprobs", "logit_bias"]},
    {"id": "gpt-3.5-turbo-0613", "context": 4096, "max_tokens_max": 4088, "price_in": 1.50, "price_out": 2.00, "price_train": 8.00,"ft_price_in": 3.00, "ft_price_out": 6.00, "units": "Mtokens", "shutoff": "2024-06-13", "replacement": "gpt-3.5-turbo-1106", "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "assistants", "logprobs", "logit_bias", "ft"]},
    {"id": "gpt-3.5-turbo-1106", "context": 16385, "max_tokens_max": 4096, "price_in": 1.00, "price_out": 2.00, "price_train": 8.00,"ft_price_in": 3.00, "ft_price_out": 6.00, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "tools", "assistants", "retrieval", "logprobs", "logit_bias", "ft"]},
    {"id": "gpt-3.5-turbo-16k-0613", "context": 16385, "max_tokens_max": 16377, "price_in": 3.00, "price_out": 4.00, "units": "Mtokens", "shutoff": "2024-06-13", "replacement": "gpt-3.5-turbo-1106", "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "assistants", "logprobs", "logit_bias"]},
    {"id": "gpt-4-0125-preview", "context": 128000, "max_tokens_max": 4096, "price_in": 10.00, "price_out": 30.00, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "tools", "assistants", "retrieval", "logprobs", "logit_bias"]},
    {"id": "gpt-4-0314", "context": 8192, "max_tokens_max": 4088, "price_in": 30.00, "price_out": 60.00, "units": "Mtokens", "shutoff": "2024-06-13", "replacement": "gpt-4-0613", "endpoints": ["/v1/chat/completions"], "features": ["chat", "assistants", "logprobs", "logit_bias"]},
    {"id": "gpt-4-0613", "context": 8192, "max_tokens_max": 4088, "price_in": 30.00, "price_out": 60.00, "price_train": 1e9,"ft_price_in": 1e9, "ft_price_out": 1e9, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "assistants", "logprobs", "logit_bias", "ft"]},
    {"id": "gpt-4-1106-preview", "context": 128000, "max_tokens_max": 4096, "price_in": 10.00, "price_out": 30.00, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "tools", "assistants", "retrieval", "logprobs", "logit_bias"]},
    {"id": "gpt-4-1106-vision-preview", "context": 128000, "max_tokens_max": 4096, "price_in": 10.00, "price_out": 30.00, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "vision", "functions", "tools", "logprobs", "logit_bias"]},
    {"id": "gpt-4-32k-0314", "context": 32768, "max_tokens_max": 32760, "price_in": 60.00, "price_out": 120.00, "units": "Mtokens", "shutoff": "2024-06-13", "replacement": "gpt-4-32k-0613", "endpoints": ["/v1/chat/completions"], "features": ["chat", "assistants", "logprobs", "logit_bias"]},
    {"id": "gpt-4-32k-0613", "context": 32768, "max_tokens_max": 32760, "price_in": 60.00, "price_out": 120.00, "units": "Mtokens",  "endpoints": ["/v1/chat/completions"], "features": ["chat", "functions", "assistants", "logprobs", "logit_bias"]},
    {"id": "gpt-4-turbo-2024-04-09", "context": 128000, "max_tokens_max": 4096, "price_in": 10.00, "price_out": 30.00, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "vision", "functions", "tools", "assistants", "retrieval", "logprobs", "logit_bias"]},
    {"id": "gpt-4o-2024-05-13", "context": 128000, "max_tokens_max": 4096, "price_in": 5.00, "price_out": 15.00, "units": "Mtokens", "endpoints": ["/v1/chat/completions"], "features": ["chat", "vision", "functions", "tools", "assistants", "retrieval", "v2", "logprobs", "logit_bias"]},
    {"id": "babbage-002", "context": 16384, "max_tokens_max": 16383, "price_in": 0.40, "price_out": 0.40, "price_train": 0.40,"ft_price_in": 1.60, "ft_price_out": 1.60, "units": "Mtokens", "endpoints": ["/v1/completions"], "features": ["completions", "logprobs", "logit_bias", "ft"]},
    {"id": "davinci-002", "context": 16384, "max_tokens_max": 16383, "price_in": 2.00, "price_out": 2.00, "price_train": 6.00,"ft_price_in": 12.00, "ft_price_out": 12.00, "units": "Mtokens", "endpoints": ["/v1/completions"], "features": ["completions", "logprobs", "logit_bias", "ft"]},
    {"id": "gpt-3.5-turbo-instruct", "context": 4096, "max_tokens_max": 4095, "price_in": 1.50, "price_out": 2.00, "units": "Mtokens", "endpoints": ["/v1/completions"], "features": ["completions", "logprobs", "logit_bias", "instruct"]},
    {"id": "text-embedding-3-large", "context": 1e9, "price_in": 0.13, "units": "Mtokens", "dimensions_max": 3072, "array_max": 2048, "endpoints": ["/v1/embeddings"], "features": ["embeddings", "dimensions"]},
    {"id": "text-embedding-3-small", "context": 1e9, "price_in": 0.02, "units": "Mtokens", "dimensions_max": 1536, "array_max": 2048, "endpoints": ["/v1/embeddings"], "features": ["embeddings", "dimensions"]},
    {"id": "text-embedding-ada-002", "context": 8192, "price_in": 0.10, "units": "Mtokens", "dimensions_max": 1536, "array_max": 2048, "endpoints": ["/v1/embeddings"], "features": ["embeddings"]},
    {"id": "dall-e-2", "context": 1000, "price_in": {"standard": {"1024x1024": 0.02, "512x512": 0.018, "256x256": 0.016}}, "units": "images", "n_max": 10, "extensions": ["png"], "endpoints": ["/v1/images/generations", "/v1/images/edits", "/v1/images/variations"], "features": ["images", "edits", "variations"]},
    {"id": "dall-e-3", "context": 4000, "price_in": {"standard": {"1024x1024": 0.04, "1024x1792": 0.08, "1792x1024": 0.08}, "hd": {"1024x1024": 0.08, "1024x1792": 0.12, "1792x1024": 0.12}}, "units": "images", "n_max": 1, "extensions": ["png"], "endpoints": ["/v1/images/generations"], "features": ["images", "hd"]},
    {"id": "tts-1-1106", "context": 4096, "price_in": 15.00, "units": "characters", "extensions": ["mp3", "opus", "aac", "flac", "pcm"], "endpoints": ["/v1/audio/speech"], "features": ["tts", "speech"]},
    {"id": "tts-1-hd-1106", "context": 4096, "price_in": 30.00, "units": "characters", "extensions": ["mp3", "opus", "aac", "flac", "pcm"], "endpoints": ["/v1/audio/speech"], "features": ["tts", "speech"]},
    {"id": "whisper-1", "price_in": 0.0001, "units": "seconds", "extensions": ["flac", "mp3", "mp4", "mpeg", "mpga", "m4a", "ogg", "wav", "webm"], "endpoints": ["/v1/audio/transcriptions", "/v1/audio/translations"], "features": ["whisper", "transcriptions", "translations"]},
    {"id": "text-moderation-stable", "context": 1e9, "price_in": 0, "price_in_app": 0, "units": "Mtokens", "endpoints": ["/v1/moderations"], "features": ["moderations"]},
    {"id": "text-moderation-latest", "context": 1e9, "price_in": 0, "price_in_app": 0, "units": "Mtokens", "endpoints": ["/v1/moderations"], "features": ["moderations"]}
    ],
  "aliases":
    [
      {"id": "gpt-3.5-turbo", "alias_to": "gpt-3.5-turbo-0125"},
      {"id": "gpt-3.5-turbo-16k", "alias_to": "gpt-3.5-turbo-16k-0613"},
      {"id": "gpt-4o", "alias_to": "gpt-4o-2024-05-13"},
      {"id": "gpt-4", "alias_to": "gpt-4-0613"},
      {"id": "gpt-4-32k", "alias_to": "gpt-4-32k-0613"},
      {"id": "gpt-4-turbo", "alias_to": "gpt-4-turbo-2024-04-09"},
      {"id": "gpt-4-turbo-preview", "alias_to": "gpt-4-0125-preview"},
      {"id": "gpt-4-vision-preview", "alias_to": "gpt-4-1106-vision-preview"},
      {"id": "gpt-3.5-turbo-instruct", "alias_to": "gpt-3.5-turbo-instruct-0914"},
      {"id": "tts-1", "alias_to": "tts-1-1106"},
      {"id": "tts-1-hd", "alias_to": "tts-1-hd-1106"}
    ],
  "tiers":
    [
    {
        "class": "gpt4",
        "id": [
        "gpt-4-0314",
        "gpt-4-0613"
        ],
        "rpm": [0, 500, 5000, 5000, 10000, 10000],
        "rpd": [0, 10000, 1e9, 1e9, 1e9, 1e9],
        "tpm": [0, 10000, 40000, 80000, 300000, 300000],
        "qpd": [0, 100000, 200000, 5000000, 30000000, 45000000],
        "encoder": "cl100k_base"
    },
    {
        "class": "gpt432",
        "id": [
        "gpt-4-32k-0314",
        "gpt-4-32k-0613"
        ],
        "rpm": [0, 1001, 1001, 1000, 1000, 1000],
        "rpd": [0, 1e9, 1e9, 1e9, 1e9, 1e9],
        "tpm": [0, 150000, 150000, 150000, 150000, 150000],
        "qpd": [0, 1500001, 1500001, 1500001, 1500001, 1500000],
        "encoder": "cl100k_base"
    },
    {
        "class": "gpt4o",
        "id": [
        "gpt-4o-2024-05-13"
        ],
        "rpm": [0, 500, 5000, 5000, 10000, 10000],
        "rpd": [0, 1e9, 1e9, 1e9, 1e9, 1e9],
        "tpm": [0, 30000, 450000, 600000, 800000, 10000000],
        "qpd": [0, 90000, 1350000, 40000000, 80000000, 1500000000],
        "encoder": "o200k_base"
    },
    {
        "class": "gpt4turbo",
        "id": [
        "gpt-4-turbo-2024-04-09",
        "gpt-4-0125-preview",
        "gpt-4-1106-preview"
        ],
        "rpm": [0, 500, 5000, 5000, 10000, 10000],
        "rpd": [0, 1e9, 1e9, 1e9, 1e9, 1e9],
        "tpm": [0, 30000, 450000, 600000, 800000, 2000000],
        "qpd": [0, 90000, 1350000, 40000000, 80000000, 300000000],
        "encoder": "cl100k_base"
    },
    {
        "class": "gpt4vision",
        "id": [
        "gpt-4-1106-vision-preview"
        ],
        "rpm": [0,  80,  100, 120, 300, 3000],
        "rpd": [0, 500, 1000, 1500, 2000, 1e9],
        "tpm": [0, 10000, 20000, 40000, 150000, 300000],
        "qpd": [0, 120000001, 120000001, 120000001, 120000001, 120000000],
        "encoder": "cl100k_base"
    },
    {
        "class": "gpt35turbo",
        "id": [
        "gpt-3.5-turbo-0301",
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-1106",
        "gpt-3.5-turbo-0125",
        "gpt-3.5-turbo-16k-0613"
        ],
        "rpm": [3,  3500, 3500, 3500, 10000, 10000],
        "rpd": [200, 10000, 1e9, 1e9, 1e9, 1e9],
        "tpm": [40000, 60000, 80000, 160000, 1000000, 2000000],
        "qpd": [200000, 200000, 400000, 10000000, 100000000, 300000000],
        "encoder": "cl100k_base"
    },
    {
        "class": "embedding",
        "id": [
        "text-embedding-3-large",
        "text-embedding-3-small",
        "text-embedding-ada-002"
        ],
        "rpm": [3000, 3000, 5000, 5000, 10000, 10000],
        "rpd": [200, 1e9, 1e9, 1e9, 1e9, 1e9],
        "tpm": [1000000, 1000000, 1000000, 5000000, 5000000, 10000000],
        "qpd": [3000000, 3000000, 20000000, 1000000000, 1500000000, 4000000000],
        "encoder": "cl100k_base"
    },
    {
        "class": "moderation",
        "id": [
        "text-moderation-latest",
        "text-moderation-stable"
        ],
        "rpm": [3, 1000, 1000, 1000, 1000, 1000],
        "rpd": [200, 10000, 1e9, 1e9, 1e9, 1e9],
        "tpm": [150000, 150000, 150000, 150000, 150000, 150000]
    },
    {
        "class": "instruct",
        "id": [
        "gpt-3.5-turbo-instruct-0914"
        ],
        "rpm": [3, 3500, 3500, 3500, 3500, 3500],
        "rpd": [200, 1e9, 1e9, 1e9, 1e9, 1e9],
        "tpm": [90000, 90000, 90000, 90000, 90000, 90000],
        "qpd": [200000, 200000, 200000, 200000, 200000, 200000],
        "encoder": "cl100k_base"
    },
    {
        "class": "dalle2",
        "id": [
        "dall-e-2"
        ],
        "rpm": [5, 5, 50, 100, 100, 500]
    },
    {
        "class": "dalle3",
        "id": [
        "dall-e-3"
        ],
        "rpm": [1, 5, 7, 7, 15, 50]
    },
    {
        "class": "whisper",
        "id": [
        "whisper-1"
        ],
        "rpm": [3, 50, 50, 100, 100, 500],
        "rpd": [200, 1e9, 1e9, 1e9, 1e9, 1e9]
    },
    {
        "class": "other",
        "id": [
        "babbage-002",
        "davinci-002"
        ],
        "rpm": [3, 3000, 3000, 3000, 3000, 3000],
        "rpd": [200, 1e9, 1e9, 1e9, 1e9, 1e9],
        "tpm": [150000, 250000, 250000, 250000, 250000, 250000],
        "encoder": "cl100k_base"
    }
]
}

Since metadata is relational with separate “data” (matching the model API), tiers (your limits), and aliases (a stable model map), and because you also have potential fine-tune models, we need some Python code that will expand to a full per-model metadata, and then emulate the API, retrieving the model results, and then providing the additional metadata

Python script to retrieve models endpoint and supplement metadata

model_metadata.py
import json, copy
from jsonschema import validate, ValidationError
from datetime import datetime
from openai import OpenAI
from typing import List, Optional, Dict, Tuple, Union, Any
from pydantic import BaseModel, Field

json_schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "description": {"type": "string"},
        "notes": {"type": "string"},
        "version": {"type": "string"},
        "schema_version": {"type": "string"},
        "author": {"type": "string"},
        "data": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "string"},
                    "context": {"type": "integer"},
                    "max_tokens_max": {"type": "integer"},
                    "price_in": {
                        "oneOf": [
                            {"type": "number"},
                            {
                                "type": "object",
                                "additionalProperties": {
                                    "oneOf": [
                                        {"type": "number"},
                                        {
                                            "type": "object",
                                            "additionalProperties": {"type": "number"},
                                        },
                                    ]
                                },
                            },
                        ]
                    },
                    "price_out": {"type": "number"},
                    "price_train": {"type": "number"},
                    "ft_price_in": {"type": "number"},
                    "ft_price_out": {"type": "number"},
                    "units": {"type": "string"},
                    "shutoff": {"type": "string"},
                    "replacement": {"type": "string"},
                    "endpoints": {"type": "array", "items": {"type": "string"}},
                    "features": {"type": "array", "items": {"type": "string"}},
                    "dimensions_max": {"type": "integer"},
                    "array_max": {"type": "integer"},
                    "n_max": {"type": "integer"},
                    "extensions": {"type": "array", "items": {"type": "string"}},
                    "price_in_app": {"type": "number"},
                    "class": {"type": "string"},
                    "rpm": {"type": "array", "items": {"type": "number"}},
                    "rpd": {"type": "array", "items": {"type": "number"}},
                    "tpm": {"type": "array", "items": {"type": "number"}},
                    "qpd": {"type": "array", "items": {"type": "number"}},
                    "encoder": {"type": "string"},
                },
                "required": ["id"],
            },
        },
        "aliases": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "id": {"type": "string"},
                    "alias_to": {"type": "string"},
                },
                "required": ["id", "alias_to"],
            },
        },
        "tiers": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "class": {"type": "string"},
                    "encoder": {"type": "string"},
                    "id": {"type": "array", "items": {"type": "string"}},
                    "rpm": {"type": "array", "items": {"type": "integer"}},
                    "rpd": {"type": "array", "items": {"type": "number"}},
                    "tpm": {"type": "array", "items": {"type": "integer"}},
                    "qpd": {"type": "array", "items": {"type": "integer"}},
                },
                "required": ["class", "id"],
            },
        },
    },
    "required": [
        "name",
        "description",
        "version",
        "schema_version",
        "author",
        "data",
        "aliases",
        "tiers",
    ],
}
model_obj = None

class ModelMetadata:
    """A class to manage model metadata."""

    def __init__(self, meta_file: str = "modelmeta.json") -> None:
        """Initialize the class and prepare the metadata."""
        self.model_meta = self.load_metadata(meta_file)
        self.validate_metadata()
        self.prepare_metadata()

    def load_metadata(self, meta_file: str) -> dict:
        """Load the JSON metadata from a file."""
        with open(meta_file, "r") as f:
            model_meta_str = f.read()

        try:
            model_meta = json.loads(model_meta_str)
        except json.JSONDecodeError as e:
            print(f"Invalid JSON: {e}")
            raise

        return model_meta

    def validate_metadata(self) -> None:
        """Validate the metadata using a JSON schema."""
        try:
            validate(instance=self.model_meta, schema=json_schema)
        except ValidationError as e:
            print(f"Validation error: {e}")
            raise

    def prepare_metadata(self) -> None:
        """Prepare the metadata by adding tiers and aliases."""
        # Adding tier information to models
        for tier_item in self.model_meta["tiers"]:
            for model_id in tier_item["id"]:
                model_item = self.find_dict(self.model_meta["data"], "id", model_id)
                if model_item:
                    for key in ["rpm", "rpd", "tpm", "qpd", "encoder"]:
                        if key in tier_item:
                            model_item[key] = tier_item[key]

        # Handling aliases to point to actual model data
        for alias_item in self.model_meta["aliases"]:
            original_item = self.find_dict(
                self.model_meta["data"], "id", alias_item["alias_to"]
            )
            if original_item:
                new_item = copy.deepcopy(original_item)
                new_item["id"] = alias_item["id"]
                new_item["alias_to"] = alias_item["alias_to"]
                self.model_meta["data"].append(new_item)

    @staticmethod
    def find_dict(lst: list, key: str, value: any) -> dict:
        """Find a dictionary in a list by a key-value pair."""
        for dic in lst:
            if value in dic.get(key, []):
                return dic
        return None

    def save_metadata(self, output_file: str) -> None:
        """Save the prepared metadata to a new JSON file."""
        with open(output_file, "w") as f:
            json.dump(self.model_meta, f, indent=4)


class ModelMetadataAPI(ModelMetadata):
    """returns API response with metadata"""

    def __init__(self, meta_file: str = "modelmeta.json", api_key: str = None) -> None:
        super().__init__(meta_file)
        self.api_key = api_key

    def api_models(self):
        global model_obj  # external diags
        client = OpenAI(api_key=self.api_key)
        try:
            model_obj = client.models.list()  # API call
        except Exception as err:
            raise ValueError(f"Model listing API call failed: {err}") from None
        return model_obj

    def api_retrieve(self, target: str):
        client = OpenAI(api_key=self.api_key)
        try:
            retrieve_obj = client.models.retrieve(target)  # API call
        except Exception as err:
            raise ValueError(f"Model listing API call failed: {err}") from None
        return retrieve_obj

    def listlist(self) -> List[dict]:
        model_pydantic_object = self.api_models()  # make API request
        model_dict = model_pydantic_object.model_dump().get("data", [])

        for model in model_dict:
            model_id = model["id"]
            metadata = self.find_dict(self.model_meta["data"], "id", model_id)
            if metadata is not None:
                for key, value in metadata.items():
                    if key != "id":
                        model[key] = value
        model_list = sorted([model for model in model_dict], key=lambda x: x["id"])
        return model_list

    def find_dict(self, data: List[Dict[str, Any]], key: str, value: Any):
        """Utility method to find a dictionary by key and value in a list of dictionaries."""
        return next((item for item in data if item.get(key) == value), None)

    def list(self):
        # Make API request to get the Pydantic object containing the models
        model_pydantic_object = self.api_models()

        # Directly access and modify the data list of Model objects
        for model in model_pydantic_object.data:
            # Check if the model ID indicates a fine-tuning model
            if model.id.startswith("ft:"):
                # Extract the base model ID from the fine-tuning model ID
                base_model_id = model.id.split(":")[
                    1
                ]  # Assumes format "ft:base_model_id:orgname::uniquepart"
                # Adjust the behavior for fine-tuning models
                fine_tuning = True
            else:
                base_model_id = model.id
                fine_tuning = False

            # Find metadata for the base model ID
            metadata = self.find_dict(self.model_meta["data"], "id", base_model_id)
            if metadata is not None:
                # Deep copy to ensure all nested data is correctly copied
                deep_copied_metadata = copy.deepcopy(metadata)
                # If it's a fine-tuning model, adjust the pricing information
                if fine_tuning:
                    if "ft_price_in" in metadata:
                        deep_copied_metadata["price_in"] = metadata["ft_price_in"]
                    if "ft_price_out" in metadata:
                        deep_copied_metadata["price_out"] = metadata["ft_price_out"]
                # Inject additional metadata directly into the model object
                model.additional_metadata = deep_copied_metadata
        return model_pydantic_object

    def retrieve(self, model_id: str):
        # Make API request to retrieve a specific model by its ID
        model_pydantic_object = self.api_retrieve(model_id)

        # Determine if the model is a fine-tuning model
        fine_tuning = model_id.startswith("ft:")
        if fine_tuning:
            # Extract the base model ID for fine-tuning models
            base_model_id = model_id.split(":")[1]
        else:
            base_model_id = model_id

        # Find metadata for the base model ID
        metadata = self.find_dict(self.model_meta["data"], "id", base_model_id)
        if metadata is not None:
            # Deep copy the metadata to ensure all nested data is correctly copied
            deep_copied_metadata = copy.deepcopy(metadata)
            # If it's a fine-tuning model, adjust the pricing information
            if fine_tuning:
                if "ft_price_in" in metadata:
                    deep_copied_metadata["price_in"] = metadata["ft_price_in"]
                if "ft_price_out" in metadata:
                    deep_copied_metadata["price_out"] = metadata["ft_price_out"]
            # Inject additional metadata directly into the model object
            model_pydantic_object.additional_metadata = deep_copied_metadata

        return model_pydantic_object


class DateTimeEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.strftime('%Y-%m-%d')
        return super().default(obj)
    ''' Why this subclass util? Suppose you want to do math on dates by altering the memory model_meta
    # Convert "shutoff" strings to datetime objects
    for item in data:
        shutoff_str = item['shutoff']
        shutoff_date = datetime.strptime(shutoff_str, '%Y-%m-%d')
        item['shutoff'] = shutoff_date
    # Later, any reason for saving full data to JSON after making shutoff dates a date:
    with open('your_file.json', 'w') as f:
        json.dump(data, f, indent=4, cls=DateTimeEncoder)
    '''

def get_model_by_id(model_list: List[dict], model_id: str) -> dict:
    for model in model_list:
        if model['id'] == model_id:
            return model
    return None

That’s just class definition that could be saved to a class library you import, such as openai_modelmeta.py

Now to use it, you might have used the models endpoint like this:

client = OpenAI()
client.models.retrieve("gpt-3.5-turbo-0125")

The replacement method for .list and .retrieve is similar, using the OpenAI client:

models = ModelMetadataAPI()
models.retrieve("gpt-3.5-turbo")

Or better, answer your questions by a formatted print:

models = ModelMetadataAPI()
response = models.retrieve("gpt-3.5-turbo")
print(json.dumps(response.model_dump(), indent=2))

The printout product is per- model type results:

{
  "id": "gpt-3.5-turbo",
  "created": 1677610602,
  "object": "model",
  "owned_by": "openai",
  "additional_metadata": {
    "id": "gpt-3.5-turbo",
    "context": 16385,
    "max_tokens_max": 4096,
    "price_in": 0.5,
    "price_out": 1.5,
    "price_train": 8.0,
    "ft_price_in": 3.0,
    "ft_price_out": 6.0,
    "units": "Mtokens",
    "endpoints": [
      "/v1/chat/completions"
    ],
    "features": [
      "chat",
      "functions",
      "tools",
      "assistants",
      "ft_assistants",
      "logprobs",
      "logit_bias",
      "ft"
    ],
    "class": "gpt35turbo",
    "rpm": [
      3,
      3500,
      3500,
      3500,
      10000,
      10000
    ],
    "rpd": [
      200,
      10000,
      1000000000.0,
      1000000000.0,
      1000000000.0,
      1000000000.0
    ],
    "tpm": [
      40000,
      60000,
      80000,
      160000,
      1000000,
      2000000
    ],
    "qpd": [
      200000,
      200000,
      400000,
      10000000,
      100000000,
      300000000
    ],
    "alias_to": "gpt-3.5-turbo-0125"
  }
}

Since each model has both an endpoint, and features that mention a name of the endpoint, you can write code that only employs those with "embeddings as a keyword feature:

response = models.retrieve("text-embedding-ada-002")

{
  "id": "text-embedding-ada-002",
  "created": 1671217299,
  "object": "model",
  "owned_by": "openai-internal",
  "additional_metadata": {
    "id": "text-embedding-ada-002",
    "context": 8192,
    "price_in": 0.1,
    "units": "Mtokens",
    "dimensions_max": 1536,
    "array_max": 2048,
    "endpoints": [
      "/v1/embeddings"
    ],
    "features": [
      "embeddings"
    ],
    "class": "embedding",
    "rpm": [
      3,
      500,
      500,
      5000,
      10000,
      10000
    ],
    "rpd": [
      200,
      10000,
      1000000000.0,
      1000000000.0,
      1000000000.0,
      1000000000.0
    ],
    "tpm": [
      150000,
      1000000,
      1000000,
      5000000,
      5000000,
      10000000
    ]
  }
}

(I didn’t just whip this up for you :stuck_out_tongue_closed_eyes:, but I’ve been plugging away for weeks)

3 Likes

This looks super helpful _j, thanks!! I haven’t used the models API endpoint yet, but are you saying there’s information in what you just provided that’s not available in any way thru the models endpoint, or you just wrapped up a more convenient way to access it?

Regardless, I know everyone will appreciate this code!

That’s really awesome @_j !!

Could you tell us what the following fields mean?

  • rpm
  • rpd
  • tpm
  • qpd
1 Like

Any number of queries are possible using the data within the class above, but I have made it basically as an API replacement giving the same pydantic return but with additional_metadata that is a per-model type schema.

examples of a few methods I did include:

Save out a expanded formatted version of the source JSON

meta = ModelMetadata()
meta.save_metadata('allmeta3.json')

Use the API to get your available models, but get a dictionary as a return (like the return JSON) that has metadata included at the same level.

meta = ModelMetadataAPI()
model_list = meta.listlist()
#print(model_list)  # this will be huge

Usage example just using the metadata (where the non-class function in the code is also an example):

model_id_to_find = 'babbage-002'
model_data = get_model_by_id(model_list, model_id_to_find)
print(model_data)

What I haven’t implemented is a method to set your tier, and then obtain only your results. The tier metadata is the default per-minute or per-day, or batch queue usage limit, with index 0-5 being from free tier to tier 5.

 ...       "class": "embedding",
        "id": [
        "text-embedding-3-large",
        "text-embedding-3-small",
        "text-embedding-ada-002"
        ],
        "rpm": [3,  500, 500, 5000, 10000, 10000],
        "rpd": [200, 10000, 1e9, 1e9, 1e9, 1e9],
        "tpm": [150000, 1000000, 1000000, 5000000, 5000000, 10000000]
    },

(A peculiar thing is that embeddings has batch limits in site listings, but batch is only described as working with chat completions…)

Default rate limits (which OpenAI can change without notice, and your account may differ)

rpm - requests per minute for model class
rpd - requests per day for model class
tpm - tokens per minute for model class
qpd - batch queue depth per day

Your fine-tune models from the models endpoint also are annotated with metadata and the inference prices (although I don’t have GPT-4 fine tuning prices)

The models endpoint just includes very basic information fields like a date added, not that useful except to see if you have a model available.

1 Like

:pray: thanks much for the detailed reply @_j
couple of Qs regarding modelmeta.json

  • will this get added to the API or needs to be maintained locally?
  • based on the initial observation, for rpm, rpd, tpm, qdp properties
    • why do we need array of values, as max-limit for each of these props?
    • are these max values, dependent on the id property of the model node?
      if so can it be made into a child-node inside model
{
        "class": "gpt4turbo",
        "versions":[
             {"id": "gpt-4-turbo-2024-04-09",
               "rpm":<value>,"rpd":<value>, "tpm": <value>, "qpd": value
             },
            {"id": "gpt-4-0125-preview",
               "rpm":<value>,"rpd":<value>, "tpm": <value>, "qpd": value
             }
        ]
    }
- valid set of `features` could be added to the respective child-node under specific `versions`.

(in continuation to my earlier reply)
What I haven’t implemented is a method to set your tier, and then obtain only your results
can we encapsulate versions, inside tiers prop? like:
tiers: [{"name": "tier1", "versions":[...]}]
tiers could become the container node in the config. And based on current user’s tier, the respective tier level versions & their respective limits could be captured.

The last update OpenAI did to the models endpoint was removing information…

The data is solely curated by me poring over every bit of published data, getting feedback from forum regulars about tiers not documented, making API calls across models to find the sometimes 1 token off specification errors etc.

I will make this forum reply the main “publication” point for updates, and you can DM me if you notice changes are needed. I don’t want to “push” updates.


OpenAI has tiers depending on your past payment history - how trusted you are - and sets the rate limits on your account from this. My entries correspond to [free, tier1, tier2, tier3, tier4, tier5].

Your tier number is in your organization’s rate limits page, and not available by API except indirectly by sampling some request headers.


The data is formatted to making updating easier and more compact. The class init method reads the file, populates all the tier info to models in “data:”, duplicates aliases with notation. (The subclass methods for listings creates fine-tune entries for all your ft: models with the ft_prices).

There is a ModelMetadata() method save_metadata(output_file) if you want a processed per-model JSON (looking like post 2 example usage). You can see how illogical or useful the final version of data is.

A subclass could be created to set stateful information on class objects, such as a desired tier, with a new method to produce the data desired.

Updated the model metadata code and JSON format, May 16:

  • included “encoding” key, where o200k_base vs cl100k_base is distinguished on those models that present tokens;
  • gpt-4o model alias and true model;
  • new rate limits for batching embeddings and completions, changes seen in other tier rates;

I had retired a token encoder type from metadata with the retirement of old gpt-3 models, but its back with a new version!


JSON Updated May 22:

  • Higher rates for tier 5; lower rates for tier 1 on gpt-4-turbo, gpt-4o
  • gpt-4-1106-vision-preview will now take tools, logprobs, logit_bias (but does not have parallel tool injected for file_search, fails in assistants)
  • new “features” string for “retrieval” for supporting models (should also match file_search ability and having parallel tool call), mirroring the assistants_models API used by playground.
  • new “features” string “v2” only added for gpt-4o. The model cannot be used with assistants beta-v1 headers so this value can block.