What is the OpenAI algorithm to calculate tokens?

JCraw · March 28, 2023, 11:21pm

This was under-reporting the tokens and I actually asked GPT-4 what as wrong with the function. It immediately noticed that your method doesn’t account for punctuation. Here’s an updated version.

def estimate_tokens(text, method = "max")
  # method can be "average", "words", "chars", "max", "min", defaults to "max"
  # "average" is the average of words and chars
  # "words" is the word count divided by 0.75
  # "chars" is the char count divided by 4
  # "max" is the max of word and char
  # "min" is the min of word and char

  word_count = text.split(" ").count
  char_count = text.length
  tokens_count_word_est = word_count.to_f / 0.75
  tokens_count_char_est = char_count.to_f / 4.0

  # Include additional tokens for spaces and punctuation marks
  additional_tokens = text.scan(/[\s.,!?;]/).length

  tokens_count_word_est += additional_tokens
  tokens_count_char_est += additional_tokens

  output = 0
  if method == "average"
    output = (tokens_count_word_est + tokens_count_char_est) / 2
  elsif method == "words"
    output = tokens_count_word_est
  elsif method == "chars"
    output = tokens_count_char_est
  elsif method == 'max'
    output = [tokens_count_word_est, tokens_count_char_est].max
  elsif method == 'min'
    output = [tokens_count_word_est, tokens_count_char_est].min
  else
    # return invalid method message
    return "Invalid method. Use 'average', 'words', 'chars', 'max', or 'min'."
  end

  return output.to_i
end

Topic		Replies	Views
Get all requested max tokens with gpt-3.5-turbo-instruct API gpt-35-turbo-instruc	20	7407	January 21, 2024
How to calculate the tokens when using function call API	58	47925	February 19, 2024
New 4-turbo model has a unique limit? Or is this a bizarre hallucation? API	18	4454	January 26, 2024
How to accurately price a gpt-4 chatbot? API gpt-4 , api	64	24545	February 6, 2024
Is the GPT4 api actually this limited or am I doing something wrong? API	13	1492	December 13, 2023

What is the OpenAI algorithm to calculate tokens?

Related topics