My question is the title basically. On the Playground, you have a counter specific to the codex counter, that can be simulated here: OpenAI API
I managed to build something in nodejs that returns this token result, but not for codex. Is this code available? On this website, they provide the encode/decode source code, but not for codex.
Welcome to the OpenAI community.
Here is the official tokenizer from OpenAI
In the screenshot below you can see there’s a toggle for switching to codex.
EDIT: I see that you want source code to the tokenizer for codex. Let me look if it’s on their GitHub, even better let’s get someone from OpenAI @staff who knows more about this.
Hello @ sps
thank you .Is it possible to have this feature in code? Really as a library that i can call?
After some searching, I found that @simonl from the community has already built a tokenizer that works for both GPT-3 and codex.
Here’s the GitHub
Here’s the original post: Codex Tokenizer Logic - #2 by simonl
Thank you so much! This was quite a deep search no? I looked and looked, but since these tools are still quite barebone they are not easy to find.
I think there were just not enough people talking about it. I have also implemented a Python version in one of our open-source work in our GitHub. If there is a need I can make it a package.
The Node.js version looks great, thank you! @simonl have you published a Python package? I was unable to find it.
Hey @glavin001 I haven’t gotten around making it into a standalone package, but here is one I put together in our serverless project (taken from openai’s gpt-2 project but added codex token logic) - openai-serverless/functions/encoder at main · botisan-ai/openai-serverless · GitHub
But if you want to have a more proper tokenizer for Python, you can look into HuggingFace’s tokenizers package. It will be much faster than this original pure Python implementation.