Token counter (Codex) for a back-end service

Hello all!

My question is the title basically. On the Playground, you have a counter specific to the codex counter, that can be simulated here: OpenAI API

I managed to build something in nodejs that returns this token result, but not for codex. Is this code available? On this website, they provide the encode/decode source code, but not for codex.

Thank you,

1 Like

Hi @JulioAlmeida

Welcome to the OpenAI community.

Here is the official tokenizer from OpenAI

In the screenshot below you can see there’s a toggle for switching to codex.

EDIT: I see that you want source code to the tokenizer for codex. Let me look if it’s on their GitHub, even better let’s get someone from OpenAI @staff who knows more about this.

Hello @ sps

thank you :slight_smile: .Is it possible to have this feature in code? Really as a library that i can call?

2 Likes

After some searching, I found that @simonl from the community has already built a tokenizer that works for both GPT-3 and codex.

Here’s the GitHub

Here’s the original post: Codex Tokenizer Logic - #2 by simonl

4 Likes

Thank you so much! This was quite a deep search no? I looked and looked, but since these tools are still quite barebone they are not easy to find.

2 Likes

I think there were just not enough people talking about it. I have also implemented a Python version in one of our open-source work in our GitHub. If there is a need I can make it a package.

2 Likes

The Node.js version looks great, thank you! @simonl have you published a Python package? I was unable to find it.

Hey @glavin001 I haven’t gotten around making it into a standalone package, but here is one I put together in our serverless project (taken from openai’s gpt-2 project but added codex token logic) - openai-serverless/functions/encoder at main · botisan-ai/openai-serverless · GitHub

But if you want to have a more proper tokenizer for Python, you can look into HuggingFace’s tokenizers package. It will be much faster than this original pure Python implementation.

1 Like