Query Contexualization is the first step of an LLM application where the user’s query is translated into a self-contained query that can be processed on a standalone basis. Here’s is OpenAI’s recommended prompt:
Re-writes user query to be a self-contained search query.
SYSTEM
Given the previous conversation, re-write the last user query so it contains
all necessary context.Example
History: [{user: “What is your return policy?”},{assistant: “…”}]
User Query: “How long does it cover?”
Response: “How long does the return policy cover?”Conversation
[last 3 messages of conversation]
User Query
[last user query]
USER
[JSON-formatted input conversation here]
It’s important that this Query Contexualization tool generate output fast so as to not bottleneck the rest of the LLM application process, how this prompt breaks down when a long reference text is added to the user query, for example:
-User query: “Translate this to French: {very, very long text},”
-User query: “Check this code for error: {very, very long snippet of code}.”
In these case, the Query Contexualization tool would just copy down the entire reference text, thus slowing down the whole process. I tried making it summarize the reference text concisely, but even GPT-4 can’t handle this type of context analysis and summary.
Any good practices to solve this problem?