How is this Hindi + English language retrieval working?

I have a rudimentary RAG POC. I had chunked and embedded lots of essays on 21st Century skills.
I posted a question to it in Hinglish, not at all expecting a useful answer!
My question: kya critical thinking jaroori nahin hai - naye jobs ke liye? hinglish mein answer deejiye.

The response: Critical thinking naye jobs ke liye bilkul jaroori hai. AI aur automation ke samay mein, analytical thinking aur problem-solving skills ka mahatva badh raha hai. Kisi bhi dynamic workplace mein critical thinking ke bina kaam karna mushkil ho sakta hai. Isliye, critical thinking ek essential skill hai jo aaj ke naye jobs ke liye zaroori hai.

All the essays I had embedded were in English. I had used text-embedding-3-large.
How is the document retrieval working out so well?

Hi and welcome to the Developer Forum!

It’s working so well because it is not using the text for searching a best match, it is using the semantic meaning behind that text, which while different in different languages still carries a close semantic similarity, the embedding model support code then computes the distance between your search term and the other entries and finds those closest.

2 Likes

I ran a few tests comparing cosine similarity
text-embedding-ada-002
Cosine Similarity: 0.9200497187734469

text-embedding-3-large
Cosine Similarity: 0.8055327006515302

text-embedding-3-small
Cosine Similarity: 0.7895661148841614