Using TikToken in Ruby to Generate `logit_bias` for a String

For those of you who use Ruby, here is how I use TikToken in my Ruby code since there is no Ruby tiktoken gem for the turbo model (yet). Just a quick hack…You can make this more elegant if you wish, of course. I just tossed this salad together today for some testing:

  1. Create a Python script, like this (after installing tiktoken, of course)
import tiktoken
import sys

def tik(words):
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    tokens = encoding.encode(words)
    return tokens

tmp = str(sys.argv[1])
print(tik(tmp))

  1. Then in your Ruby code, you can do something easy like this. and with less code if you like:
module TikToken
    def self.get_tokens(string=nil, bias=0)
            return nil if string.nil?
            temp_logit= `python3 #{Rails.root}/scripts/tik.py "#{string.strip}"`
            puts "temp_logit: #{temp_logit}"
            logit_bias_text = temp_logit.strip.gsub("[","").gsub("]","").split(",")
            puts "logit_bias_text: #{logit_bias_text}"
            logit_bias_hash = {}
            logit_bias_text.each do |bias_key|
                logit_bias_hash.merge!({"#{bias_key.strip}":bias.to_i})
            end
            logit_bias_hash
    end
end

Results:

irb(main):002:0> TikToken.get_tokens("abracadabra",10)
=> {:"370"=>10, :"20554"=>10, :"329"=>10, :"45032"=>10}

Not very elegant, but it works for me in my OpenAI test lab testing to help community members here.

Hope this helps.

:slight_smile:

4 Likes

I went ahead and made a simple Ruby binding. It compiles the underlying Rust library (actually a wrapper someone else wrote of the underlying library in Rust, but that’ isn’t terribly important). You can invoke it like

require 'tiktoken_ruby'

module TikTokenBias 
    def self.get_tokens(string=nil, bias=0)
            return nil if string.nil?
            encoding = Tiktoken.encoding_for_model("gpt-3.5-turbo")
            
            tokens = encoding.encode(string)
            tokens.map do |token|
                [token, bias.to_i]
            end.to_h
    end
end
3 Likes

Some folks made js and java bindings for tiktoken, and we built a little ruby wrapper based off their work:

It can be installed with gem install tiktoken

Still need to document it, but briefly:

enc = Tiktoken::encoding_for_model('gpt2')
enc2 = Tiktoken::get_encoding('p50k_base')
tokens = enc.encode(prompt)
prompt = enc.decode(tokens)
1 Like