Tiktoken does not work on Mac M1 Pro

I have issues with tiktoken on Mac arm64 processor.

tiktoken.get_encoding("cl100k_base")

throws python3.9/site-packages/lxml/etree.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

where lxml is the newest version (4.9.2)

Are there any ways to get tiktoken working on Mac M1 Pro or could you suggest some other tokenizer for embeddings? Which one?

Hey @deepscreener. I have a Mac M1 Pro and it works fine for me. It seems that it has to do with your tiktoken version (it states that you have installed the Intel’s version). Maybe reinstalling tiktoken?

2 Likes

Tiktoken also works fine on my MacStudio M1.

HTH

:slight_smile:

I could be mistaken but I recall this being an issue with not having Rust installed.

If you go to the GitHub issues it’s probably still there.

Actually, after a quick search it looks like the issue is from having the wrong lxml library installed and can be solved with this:

pip uninstall lxml
ARCHFLAGS=β€œ-arch arm64” pip install lxml --compile --no-cache-dir

3 Likes

Edit: I double checked my initial reactions, knowing @RonaldGRuckus posts good information:

Tiktoken does not require Rust, at least on my Mac (or to my knowledge).

and found that it may be required, per GitHub… (see Appendix in next post)

Tiktoken is a Python app and requires a properly (standard) installed Python3.

HTH

:slight_smile:

See Also:

Here is my M1 system information and a Tiktoken example, hope it helps you @deepscreener

Mac M1 Info:

MacStudio$ system_profiler SPSoftwareDataType SPHardwareDataType
Software:

    System Software Overview:

      System Version: macOS 13.2.1 (22D68)
      Kernel Version: Darwin 22.3.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: MacStudio
      User Name: Tim Bass (tim)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 1 day, 15 hours, 8 minutes

Hardware:

    Hardware Overview:

      Model Name: Mac Studio
      Model Identifier: Mac13,1
      Model Number: Z14J0006CTH/A
      Chip: Apple M1 Max
      Total Number of Cores: 10 (8 performance and 2 efficiency)
      Memory: 32 GB
      System Firmware Version: 8419.80.7
      OS Loader Version: 8419.80.7
      Serial Number (system): XFWHDJ4HVK
      Hardware UUID: 0F7157D3-3F8F-5BB7-A101-1A4308FC2202
      Provisioning UDID: 00006001-000241AC3C41801E
      Activation Lock Status: Enabled

Tiktoken Example:

MacStudio$ python3 tik.py "Hello World"
[9906, 4435]

tik.py

MacStudio:scripts tim$ cat tik.py
import tiktoken
import sys

def tik(words):
    encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    tokens = encoding.encode(words)
    return tokens

tmp = str(sys.argv[1])
print(tik(tmp))

Note:

Devs can get the same results using:

tiktoken.get_encoding("gpt-3.5-turbo")

So maybe consider changing to β€œgpt-3.5-turbo” and trying to see if that helps?

:slight_smile:

Appendix

Change Python script above to use β€œcl100k_base”

import tiktoken
import sys

def tik(words):
    encoding = tiktoken.get_encoding("cl100k_base")
    #encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
    tokens = encoding.encode(words)
    return tokens

tmp = str(sys.argv[1])
print(tik(tmp))

Test Above Script (same results as before)

MacStudio$ python3 tik.py "Hello World"
[9906, 4435]

Checking Rust …

I just double checked this and found this Rust dependency on GitHub, I was not aware of:

Rust Info

MacStudio:$ brew info rust
==> rust: stable 1.67.1 (bottled), HEAD
Safe, concurrent, practical language

Not sure if this helps or not.

Shout out to @RonaldGRuckus for pointing this out. I did not realize this dependency existed since I don’t use Rust these days and forgot Rust was installed on my Mac.

:slight_smile:

Final Note.

I uninstalled Rust on my Mac, as follows:

brew uninstall rust

Confirm

MacStudio:scripts tim$ brew info rust
==> rust: stable 1.67.1 (bottled), HEAD
Safe, concurrent, practical language
https://www.rust-lang.org/
Not installed

Retest

MacStudio$ python3 tik.py "Hello World"
[9906, 4435]

Not sure what is going on here, uninstalled Rust and tiktoken works as it did before on my M1.

:slight_smile:

1 Like

This did the work. Thanks!

2 Likes

pip3.10 works out for me on my mac.

$ pip3.10 install tiktoken
Collecting tiktoken
  Downloading tiktoken-0.3.3-cp310-cp310-macosx_10_9_x86_64.whl (735 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 735.6/735.6 kB 8.5 MB/s eta 0:00:00
Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.10/site-packages (from tiktoken) (2.28.2)
Collecting regex>=2022.1.18
  Downloading regex-2023.3.23-cp310-cp310-macosx_10_9_x86_64.whl (294 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.4/294.4 kB 11.6 MB/s eta 0:00:00
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (2022.12.7)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (1.26.15)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken) (3.1.0)
Installing collected packages: regex, tiktoken
Successfully installed regex-2023.3.23 tiktoken-0.3.3

[notice] A new release of pip is available: 23.0 -> 23.0.1
[notice] To update, run: python3.10 -m pip install --upgrade pip