We have a RAG assistant application where most or all of the RAG content is in English, the instructions are in English, and a question is posed in English, but the answer will come back in French. The answer is perfect and uses the appropriate RAG content beautifully. It’s just that it answers in French.
The one signal that French may be appropriate is that within the instructions, we include the location of the user – something like, “The visitor is located in Montreal, Quebec, Canada.” Note that we also have a statement such as, “Always answer in the language of the question.” Regardless, the responses come back in French.
We really can’t help unless you post some actual data.
There’s many, many reasons why this would be happening. Going off what you’ve provided isn’t enough to make anything besides a guess of “it’s confused”.
What would make the most sense to me is to try and replicate this happening in your own test environment or give us the data so that we can try to replicate it
Apologies. With a RAG setup and instructions that include a lot of private IP it’s difficult to post a complete setup. I absolutely understand your reservation about speculating. I was wondering if others had run into a similar problem.
We’ll do some more analysis by looking at all of the RAG content that was included in the context. But I’m pretty confident that this is tied to that instruction about location – based on the fact that we’ve seen this same effect multiple times for different countries. In all cases, the language returned matched the primary language of the user’s country.
I can understand not wanting to share the actual real-world data but I am saying you can simulate the same experience and then provide the information here with made up fluff (as long as it’s not altered afterwards)
When it comes to prompting & context management there’s a million reasons why something went wrong.
We were able to reproduce this with a simple example. You’ll see in the screenshots two examples. We cut down the instructions to two simple statements – one about location and another about response language. In the first example, it responds in German when the location is specified at Switzerland, even though the question was in English. In the second, it responds in French to someone in Quebec.
One more point… If I change the second instruction to explicitly ask it to respond in English, it will do so reliably. It is very good at responding in the language asked when non-English. But when asked in English, it seems to ignore our instruction about responding in the language of the question.
Ah, yes, gpt-4o-mini isn’t the best at following instructions.
I’ve had similar issues. I resolved them by having a stronger model (like gpt-4o) start the conversation (the first message), and then have gpt-4o-mini take over
That’s a cool idea I hadn’t considered. We’ll have to think about cost impacts – since we process a LOT of short conversations – often only 1 or 2 messages. At 10x the cost for input tokens, gpt-4o would definitely affect our cost structure. (Of course, I understand that there’s no free lunch!)
BTW, our experience is that with patience we’ve found a set of instructions that are working extremely well overall with 4o-mini. We are getting the behavior we want for most of our detailed instructions. This language thing is an edge case we’re still working on.
Nice. I think you’ve found the issue and now can tinker around until a solution is found.
Another option is to run a classifier on the language, then you can explicitly instruct the model which language to respond in instead of relying on it’s own inference.
Takeaway: Provide an AI identity and purpose as a system message, not an overreaching snippet of user info.
The AI is perhaps put in a state of self-doubt. “I already write in useful languages anyway, so what alignment is being asked of me here?” Or “do these two pieces of information about location and language institute a new behavior?”
Well, this is funny! I was teaching ChatGPT o1 to solve Wordle, and we had been talking the whole time in English and ChatGPT o1 responded in Chinese! I have the screenshots and after using Google translating her thought process was correct and answer was correct but in Chinese. As a multilingual person, this happens to me, but I didn’t expect a LLM to have the same problems human have.
I can report that we are now having success with this cross-language problem. Indeed, the change we made was just to the instructions to gpt-4o-mini. Any human reading these two sets of instructions (before and after) would probably not have a different behavior, but 4o-mini sure does. It feels like the ordering of instructions (even when they are unrelated to each other) has a significant difference in the outcomes.
Anyway, I am learning the mantra, “Test, test, test…”