Languages supported by text-embedding-3-large

_j · March 9, 2024, 2:21am

It’s going to support it, but the question is: how well.

Some example code. Hebrew input, translations, and a near miss.

from openai import OpenAI as o; cl = o()
import numpy as np

text =[ "אל תלטף את הדורבן.", "אני אוהב/ת אייפון!"]
text += ["Don't pet the porcupine.", "I love iPhone!", "Avoid the platypus"]

for model in ["text-embedding-3-small", "text-embedding-3-large"]:
    try:
        out = cl.embeddings.create(input=text, model=model)
        print("\n---", model)
    except Exception as e:
        print(f"ERROR {e}")
    array = np.array([data.embedding for data in out.data])
    for compi, comp in enumerate(text[:2]):
        print("====", compi, comp, "====")
        for i, j in zip(text, array):
            print(f"{i}: {np.dot(array[compi], j):.5f}")

Gives us some data points:

--- text-embedding-3-small
==== 0 אל תלטף את הדורבן. ====
אל תלטף את הדורבן.: 1.00000
אני אוהב/ת אייפון!: 0.26534
Don't pet the porcupine.: 0.25681
I love iPhone!: 0.08965
Avoid the platypus: 0.25611
==== 1 אני אוהב/ת אייפון! ====
אל תלטף את הדורבן.: 0.26534
אני אוהב/ת אייפון!: 1.00000
Don't pet the porcupine.: 0.07188
I love iPhone!: 0.62643
Avoid the platypus: 0.02311

--- text-embedding-3-large
==== 0 אל תלטף את הדורבן. ====
אל תלטף את הדורבן.: 1.00000
אני אוהב/ת אייפון!: 0.30880
Don't pet the porcupine.: 0.28773
I love iPhone!: 0.01331
Avoid the platypus: 0.20583
==== 1 אני אוהב/ת אייפון! ====
אל תלטף את הדורבן.: 0.30880
אני אוהב/ת אייפון!: 1.00000
Don't pet the porcupine.: 0.01012
I love iPhone!: 0.58528
Avoid the platypus: 0.04751

Analysis:

3-small can’t distinguish porcupine from platypus comparing to English.
3-large can do that much better

Both tend to prefer their own language about a different subject instead of the direct translation. This is not seen in comparing Latin languages.

All-Hebrew analysis is not done, as I would not and few readers would understand the results. You can come up with your own native-written texts for the quick script for curiosity. Then embed your application.

Topic		Replies	Views
Embedding in a different language API	3	4499	December 14, 2023
What languages does the retrieval embedding support for gpt4-1106-preview API	8	2120	November 8, 2023
How many human languages does text-embedding-ada-002 support? API languages , dialects	4	11978	February 23, 2024
Use API with the Greek Language API	2	1513	July 2, 2021
Does ada support other languages than English? API embeddings , question	13	13238	October 18, 2023

Languages supported by text-embedding-3-large

Analysis:

Related topics