What's the best way to benchmark tokens/sec of fine-tuned model?

According to the docs, fine-tuning a model can result in lower latency requests. I have a fine-tuned model, and it is indeed faster. However, I would like to calculate the tokens per second (tokens/sec), and I am not sure if the way I am doing it is the best:

  • start timer before making the API call
  • API call using fine-tuned model
  • stop timer
  • use the tokenizer provided by gpt-3-encoder package to estimate the total tokens in the response
  • divide by the time taken

It looks correct, but I would like to know if there is any more “official” way.

Hi and welcome to the Developer Forum!

Couple of things to bare in mind, 1) make sure you are using tiktoken with the CL100K_BASE model for GTP3.5 token counting. 2) Bare in mind that there will be a certain amount of pre processing time to take into account.

One thing you can do it turn streaming on and then you get to see the tokens arrive in real time, albeit with a small overhead for each delta packet.

When you don’t use streaming=true, you get the token count in your response.

I made a little chat loop for completions (if you trained davinci-002 or babbage-002) that reports the total time of the API call.

import time
import openai
openai.api_key = key
system = """
An AI assistant replies to user input. It keeps no memory of chat.
assistant: I am a helpful artificial intelligence, capable of many human-like tasks.
user = "Write an introduction a user will see when they first start your chatbot program"
while not user in ["exit", ""]:
    stime = time.time()
    api_out = openai.Completion.create(
        prompt = system + "\n\nuser: " + user + "\nassistant:",
        model="gpt-3.5-turbo-instruct", stream=stream, max_tokens=666)
    ctime = round(time.time() - stime, ndigits=3)
    if stream == True:
        for chunk in api_out:
            print(chunk["choices"][0]["text"], end='')
        ctokens = int(api_out['usage']['completion_tokens'])
        tps = round(ctokens / ctime, ndigits=1)
        print(f"-- completion: time {ctime}s, {ctokens} tokens, {tps} tokens/s --")
    user = input("==>")

Output of interactions:>

Hello! My name is AI Assistant and I am here to assist you with any tasks or questions you may have. I am constantly learning and improving to provide you with the best experience possible. How may I help you today?
– completion: time 0.756s, 45 tokens, 59.5 tokens/s –
==>How many cats can happily and healthfully occupy an average home?
The number of cats that can happily and healthfully occupy an average home can vary depending on the size of the home and the individual needs of the cats. It’s best to consult with a veterinarian or animal behaviorist for specific recommendations.
– completion: time 0.903s, 47 tokens, 52.0 tokens/s –
==>Supply an AI estimation and be decisive: How many cats can happily and healthfully occupy an average home?
The number of cats that can happily and healthfully occupy an average home would vary depending on factors such as space, resources, and individual preferences. However, a general estimation would suggest that 2-3 cats would be a reasonable number for a happy and healthy living environment. It is important to also consider the wellbeing and when making a decision about pet ownership.
– completion: time 0.734s, 72 tokens, 98.1 tokens/s –