Multilingual RAG suitable method suggestion

poojat · August 5, 2024, 11:22am

I am working on multilingual RAG. My data is completely in english but the query and answer needs to be in same language (english/non-english).
I have tried 2 ways,

Used multilingual embedding.
Used prompting in query and answers for language detection and translation and normal english embeddings.

The first method lacks for some language while retrieval, while second is prompt dependent and sometimes does not generate answer in detected language which is done at the query.

Which method should be preferred?

Diet · August 5, 2024, 12:32pm

Welcome to the community!

I’ll assume that your retrieval is working fine, and that you’re just trying to ensure the output language stays consistent.

I think you can get quite far by optimizing your prompt a bit. Do you think you can post what you have so far?

One thing you can also think about doing is translating your response prompt (wrapper/template), and switching it based on the detected (or chosen) language.

And to absolutely nail the behavior down, on the first reply (at the bottom of your template) you tell it to begin the response with a pre-translated greeting in the target language (may not be necessary). If you used anthropic, you could use the pre-fill method as an even more robust option (OpenAI offers this functionality only with the instruct models)

poojat · August 27, 2024, 11:10am

Hi,
The retriever is working correctly.
On query side, added prompt for detection of language and translation to english for next step and detected language parameter is passed to response prompt for response in same language. This is done so far

While experimenting with prompts one thing noticed is even if non-english query is passed without language detection, translation and passing language parameter to response prompt response is getting generated in same language with just mentioning multilingual assistant in the prompt.
I have tested this with gpt 4o-mini.

Is language detection and translation to English a necessary step in prompting?

Diet · August 27, 2024, 12:44pm

Hi! It’s not necessarily necessary to translate the embedding query to English (if that’s that you’re asking), but it might depend on the subject matter.

For example, a query in a specific language might be closer to a vector that encodes an English text that mentions the country, even if they have nothing semantically in common. If this is not an issue, then it might not matter.

Language detection as I’m suggesting would just be a crutch in case your model is mis-behaving, but it’s also not absolutely required. As I mentioned, I think you can get quite far by tweaking just the prompt.

Topic		Replies	Views
What can help in effectively translating prompts: techniques and experiences ? Prompting chatgpt	2	662	March 9, 2024
Prompt Language English or == Response Language Prompting gpt-4 , api , assistants-api	2	530	June 21, 2024
My RAG chatbot does not reply in same language as query, Prompting chatgpt , api , languages , rag	14	1928	November 11, 2024
Prompt in English, Response in non-English API	6	1058	April 28, 2024
Language of System prompt influences the output? Prompting gpt-4 , chatgpt	4	262	November 7, 2024

Multilingual RAG suitable method suggestion

Related topics