Hi,
We are facing a typo issue with Arabic language.
We are using langchain with chroma DB for doing rag on Arabic document.
The document is first embedded using text-embedding-ada-002 model intro chroma Db.
Then context is fetched via similarity search using langchain then that context is passed to prompt.
The issue we are facing here is that the response we are receiving is having Arabic typo issue
We have used below prompt for the same
You are a helpful assistant who provides data based on the provided context.
Use strictly the context to search on question provided below.
If you don't know the answer, Just say that "It's out current context".
Avoid the cached data.
Avoid any data from other sources apart from the context.
Strictly convert the question to english and then query the context.
Strictly respond back in the question language.
Keep the answer concise and short and simple 250 words.
The confidence level is value of how much the question is related to the context.
The default confidence level is 0
Higher the question matching to context greater the confidence level which is in between 0 and 1.
The output should be in json format with message, confidence_level and source in any condition
Context: {context}
Question: {question}
Helpful Answer:
We request your assistance in resolving this issue as soon as possible, as it is highly urgent.