This post has been deleted by OP - 20240411

Query Contexualization is the first step of an LLM application where the user’s query is translated into a self-contained query that can be processed on a standalone basis. Here’s is OpenAI’s recommended prompt:

Re-writes user query to be a self-contained search query.

Given the previous conversation, re-write the last user query so it contains
all necessary context.


History: [{user: “What is your return policy?”},{assistant: “…”}]
User Query: “How long does it cover?”
Response: “How long does the return policy cover?”


[last 3 messages of conversation]

User Query

[last user query]
[JSON-formatted input conversation here]

It’s important that this Query Contexualization tool generate output fast so as to not bottleneck the rest of the LLM application process, how this prompt breaks down when a long reference text is added to the user query, for example:

-User query: “Translate this to French: {very, very long text},”
-User query: “Check this code for error: {very, very long snippet of code}.”

In these case, the Query Contexualization tool would just copy down the entire reference text, thus slowing down the whole process. I tried making it summarize the reference text concisely, but even GPT-4 can’t handle this type of context analysis and summary.

Any good practices to solve this problem?

1 Like

Okay, this doesn’t make sense to me.

If you want to translate or check code for errors, why would you summarize it?

Are you just trying to speed up your API calls?

Try using a 1-shot or 2-shot with GPT-3.5-turbo… would be my advice… would increase speed and with a 1-shot or 2-shot example, it should be able to grok what you’re doing…

Right. Why I recommended trying a 1-shot or 2-shot… the tokens are a LOT cheaper, so you can add a bit more to get it to output what you want.

A few people recently have noticed that some of the smaller models perform better than the bigger models when given enough examples in the prompt.

Gotcha. Might want to step back and work on the prompt then… simplify if you can.

Can you split it into separate calls maybe?

Isn’t the full reference text only needed at the time the initial question is asked that needs that text? Then after a short conversation can’t you just say “Please summarize this conversation in a single paragraph” if you’d like a summary that omits the full context? If you need summaries ask for summaries, and if you need full context just use that. I’m not sure I understand the problem.