I'm getting error gpt-4-0613 does not exist. I'm confused

Or you can just hit the API with a massive request.

This should get you $0.50 of usage with one call (unless the API throws an empty delta word into the stream to crash the openai library). 125k tokens x $0.004

import openai
import time
import datetime
openai.api_key = "sk-HpSfjaoiaffnaxxxxxx"
openai.api_requestor.TIMEOUT_SECS = 3600
start_time = time.time()
print(datetime.datetime.fromtimestamp(start_time).strftime('%Y-%m-%d %H:%M:%S'))
response = openai.ChatCompletion.create(
    messages = [{"role": "system", "content": "JSON List each Joyo Kanji."}],
    temperature = 0.0, n = 100, max_tokens = 1250,
    model="gpt-3.5-turbo-16k", stream=True, request_timeout = 3600)
for idx, delta in enumerate(response):
    if idx % 20 == 0:
        print(".", end="")
elapsed_time = time.time() - start_time
print(f"\nAPI completion: {elapsed_time:.1f} seconds")

If you request near or over 180000 tokens calculated by max_tokens x n, you get a token per minute rate limit error before you even start, even though it would take many minutes to receive.

Specify no max_tokens, and you get no such calculation of rate limit blocking, and possibility of a massive bill over half an hour of data being sent with potential for 1.6 million tokens. A single n=1 prompt run was returning 10000+ tokens with API timeout. Now adding timeout increase, possibly up to 16000, but apparent model behavior change now makes 5000-6000 more likely per run.

Just a bit of experience here with the 0613 series.

gpt-3.5-turbo-0613 is far superior in terms of response times so unless you really need the power of gpt-4 I would rely on that model instead. Your costs are going to be significantly lower to boot.

Especially when calling LLM multiple times before you generate a response (like from within an agent), the wait times for GPT-4 can be far far longer.

1 Like