How is this Hindi + English language retrieval working?

goelvibhor76 · May 27, 2024, 11:31am

I have a rudimentary RAG POC. I had chunked and embedded lots of essays on 21st Century skills.
I posted a question to it in Hinglish, not at all expecting a useful answer!
My question: kya critical thinking jaroori nahin hai - naye jobs ke liye? hinglish mein answer deejiye.

The response: Critical thinking naye jobs ke liye bilkul jaroori hai. AI aur automation ke samay mein, analytical thinking aur problem-solving skills ka mahatva badh raha hai. Kisi bhi dynamic workplace mein critical thinking ke bina kaam karna mushkil ho sakta hai. Isliye, critical thinking ek essential skill hai jo aaj ke naye jobs ke liye zaroori hai.

All the essays I had embedded were in English. I had used text-embedding-3-large.
How is the document retrieval working out so well?

Foxalabs · May 27, 2024, 11:35am

Hi and welcome to the Developer Forum!

It’s working so well because it is not using the text for searching a best match, it is using the semantic meaning behind that text, which while different in different languages still carries a close semantic similarity, the embedding model support code then computes the distance between your search term and the other entries and finds those closest.

goelvibhor76 · May 28, 2024, 3:50am

I ran a few tests comparing cosine similarity
text-embedding-ada-002
Cosine Similarity: 0.9200497187734469

text-embedding-3-large
Cosine Similarity: 0.8055327006515302

text-embedding-3-small
Cosine Similarity: 0.7895661148841614

shreyash.sarve04 · April 10, 2025, 12:42pm

i also want to know my scenario is
if i ask the question in hinglish then still the comparison can be done like i want to know the accuracy for the free model

Topic		Replies	Views
Embedded Data for chat bot API gpt-35-turbo , chatgpt , semantic-search	9	1109	November 6, 2023
Embedding testing with different models API chatgpt	2	2766	December 24, 2023
Semantic search through embeddings API	3	1323	January 22, 2023
Embeddings and Cosine Similarity API	20	14776	February 25, 2024
Embedding - text length vs accuracy? API	13	16387	December 25, 2023

How is this Hindi + English language retrieval working?

Related topics