Semantic caching strategy for multilingual chatbot: how to handle language-specific cache entries?

I'm building a multilingual chatbot (Italian, English, Spanish, etc.) that acts as a travel consultant for a specific city, using semantic caching with a vector database to reduce LLM API costs and latency.

## Current Architecture

Cached responses are stored with embeddings and language metadata:

```python
# English entry
{
  "embedding": [0.23, 0.45, ...],
  "metadata": {
    "question": "what are the best restaurants?",
    "answer": "The best restaurants are: Trattoria Roma, Pizzeria Napoli...",
    "language": "en"
  }
}

# Italian entry
{
  "embedding": [0.24, 0.46, ...],
  "metadata": {
    "question": "quali sono i migliori ristoranti?",
    "answer": "I migliori ristoranti sono: Trattoria Roma, Pizzeria Napoli...",
    "language": "it"
  }
}


The Problem

Since embeddings are semantic, “best restaurants” (English) and “migliori ristoranti” (Italian) have very similar vectors. Without proper filtering, an Italian user asking “ristoranti” might get the cached English response.

My current approach: Filter vector search by language metadata:

results = vector_db.query(
    embedding=embed(user_message),
    filter={"language": user_language},
    top_k=1
)


This works IF I can reliably detect the user’s language. But:

  • Messages are often very short (“museums”, “metro”, “parking”)

  • Language detection libraries (langdetect, fastText) are unreliable with < 20 characters

  • The chatbot is stateless (no conversation history for caching efficiency)

  • Platform is WhatsApp (no browser headers available)

What’s the recommended semantic caching strategy for multilingual chatbots when user language cannot be reliably detected from short messages?

This topic was automatically closed after 24 hours. New replies are no longer allowed.