Does ada support other languages than English?

raymonddavey · May 1, 2023, 6:59pm

We did a huge embedded database using French, English, German, Spanish and Portuguese for an academic research paper.

The embedding worked well in multiple languages

However, we kept track of the source language for each piece of text we embedded

Then when we ran the final query, we asked the question in the same language. We found that if you embed in one language and query in another, the dot products are a bit skewed. But if you ask in the same language, the numbers come into line with each other.

In our case, we had a mix of source documents. So when we asked the final question(s), we converted the question into the 5 languages we knew we had. Then we ran the dot products over each of the sources that were in the matching languages (I hope that makes sense)

We took the top matches from each pass (ie semantic search in the native languages), and combined them into a single set (ie a mixed language result set). Then we sorted by the dot products to get the final top hits. This often resulted in a mix of languages

Once we did this, we sent the final query to GPT4 (or 3) and asked the question in English. Even though the sources were mixed languages, GPT3 managed to give us a combined answer from all the selected texts.

Ask questions if that didn’t make sense or you need clarification

Topic		Replies	Views
Embedding Results Scale Seems Off API embeddings , ada	8	5198	December 24, 2023
Question about embeddings (ada 002) with numeric values API	7	2692	December 17, 2023
Question on text-embedding-ada-002 API	12	6467	December 24, 2023
OpenAI Embeddings - Multi language API embeddings	27	17049	December 17, 2023
Embeddings and Cosine Similarity API	20	14774	February 25, 2024

Does ada support other languages than English?

Related topics