For those of you who use Ruby, here is how I use TikToken in my Ruby code since there is no Ruby tiktoken gem for the turbo
model (yet). Just a quick hack…You can make this more elegant if you wish, of course. I just tossed this salad together today for some testing:
- Create a Python script, like this (after installing tiktoken, of course)
import tiktoken
import sys
def tik(words):
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
tokens = encoding.encode(words)
return tokens
tmp = str(sys.argv[1])
print(tik(tmp))
- Then in your Ruby code, you can do something easy like this. and with less code if you like:
module TikToken
def self.get_tokens(string=nil, bias=0)
return nil if string.nil?
temp_logit= `python3 #{Rails.root}/scripts/tik.py "#{string.strip}"`
puts "temp_logit: #{temp_logit}"
logit_bias_text = temp_logit.strip.gsub("[","").gsub("]","").split(",")
puts "logit_bias_text: #{logit_bias_text}"
logit_bias_hash = {}
logit_bias_text.each do |bias_key|
logit_bias_hash.merge!({"#{bias_key.strip}":bias.to_i})
end
logit_bias_hash
end
end
Results:
irb(main):002:0> TikToken.get_tokens("abracadabra",10)
=> {:"370"=>10, :"20554"=>10, :"329"=>10, :"45032"=>10}
Not very elegant, but it works for me in my OpenAI test lab testing to help community members here.
Hope this helps.