Languages supported by text-embedding-3-large

It’s going to support it, but the question is: how well.

Some example code. Hebrew input, translations, and a near miss.

from openai import OpenAI as o; cl = o()
import numpy as np

text =[ "אל תלטף את הדורבן.", "אני אוהב/ת אייפון!"]
text += ["Don't pet the porcupine.", "I love iPhone!", "Avoid the platypus"]

for model in ["text-embedding-3-small", "text-embedding-3-large"]:
    try:
        out = cl.embeddings.create(input=text, model=model)
        print("\n---", model)
    except Exception as e:
        print(f"ERROR {e}")
    array = np.array([data.embedding for data in out.data])
    for compi, comp in enumerate(text[:2]):
        print("====", compi, comp, "====")
        for i, j in zip(text, array):
            print(f"{i}: {np.dot(array[compi], j):.5f}")

Gives us some data points:

--- text-embedding-3-small
==== 0 אל תלטף את הדורבן. ====
אל תלטף את הדורבן.: 1.00000
אני אוהב/ת אייפון!: 0.26534
Don't pet the porcupine.: 0.25681
I love iPhone!: 0.08965
Avoid the platypus: 0.25611
==== 1 אני אוהב/ת אייפון! ====
אל תלטף את הדורבן.: 0.26534
אני אוהב/ת אייפון!: 1.00000
Don't pet the porcupine.: 0.07188
I love iPhone!: 0.62643
Avoid the platypus: 0.02311
--- text-embedding-3-large
==== 0 אל תלטף את הדורבן. ====
אל תלטף את הדורבן.: 1.00000
אני אוהב/ת אייפון!: 0.30880
Don't pet the porcupine.: 0.28773
I love iPhone!: 0.01331
Avoid the platypus: 0.20583
==== 1 אני אוהב/ת אייפון! ====
אל תלטף את הדורבן.: 0.30880
אני אוהב/ת אייפון!: 1.00000
Don't pet the porcupine.: 0.01012
I love iPhone!: 0.58528
Avoid the platypus: 0.04751

Analysis:

3-small can’t distinguish porcupine from platypus comparing to English.
3-large can do that much better

Both tend to prefer their own language about a different subject instead of the direct translation. This is not seen in comparing Latin languages.

All-Hebrew analysis is not done, as I would not and few readers would understand the results. You can come up with your own native-written texts for the quick script for curiosity. Then embed your application.

2 Likes