Thank you for detailed reply.
Allow me to elaborate on this.
1. Addressing the Root Problem: Data Preparation Before Embedding
Data is prepared with the highest quality and stored in vector db. This contains all company’s KB articles from web - parsed, formatted, and chunked.
This is not a root problem.
The root problem is how to select these chunks for inclusion into context.
2. Semantic Chunking: Split each ticket into smaller, closed-idea chunks.
Please explain what you mean by “Semantic Chunking“: Split each ticket?
I use the term chunks/chunking for context store, you are referring to user inquiry?
For instance, user asks: “Can I update UPC?” There is nothing to split. This is about using alternative term for “Barcode”. And this is something where embeddings really fail.
4. Chunk Summaries
Won’t work in my case since some KB articles are long technical documents having long lists of data field definitions/descriptions and use examples, which could be hundreds (like API documentation). There is no point in writing summary naming all these fields which will make summary the same length as original chunk content. That’s the reason each document/tutorial (that might be chunked) has a very clear and descriptive title, e.g. “tutorials_how_to_change_existing_shopify_orders_status_to_paid”
5. Relationships Between Chunks
This is interesting. Could you please explain more about this topic? I really don’t get how these relationships could be “described” and used.
In our case, embeddings are Markdown documents perfectly split by headings, and if chunked retaining heading structure to clearly identify chunk’s place in hierarchy.
Sometimes, for consistency, certain documents are split only down to defined heading level in order to ensure specific chapter is complete and sufficient to grasp the totality of a given topic.
Would this be a sufficient “relationship” definition?
6. Ticket Summaries
Please explain what you mean by the term “ticket”.
For me “ticket” is a support case with its ID, actors, and conversation, consisting of user inquiries and assistant responses.
3. Identifying the User’s Intent and Problem Description
This aspect is solved via assistant prompt instructing how to identify the problem using an iterative process.
I used to call agent in order to reformulate the first question in the conversation and clearly noticed missing key points by LLM with the first shot. One of the reasons was actually loosing critical keywords and replacing them with other similar terms that do not translate to closest vector distances.
Now with “title_selector” strategy I provide a complete list of document titles that essentially define a dictionary of our terms or sort of mind-map elements. So here LLM’s task is to map terms used in user inquiry to terms listed as a set of available articles, and this is done quite well. This is exactly the place to map user intent to articles potentially containing answers to the user’s problem.
From my experience, letting LLM transform the user inquiry very often slips off the original thought.
The exception is query condensing in ongoing conversations to retain conversation context.
4. Pre-selecting the Most Relevant Search Results
If I got it right, you are talking about context reranking here?
Probably this is something I could implement, but for now, I just add all chunks collected by different selector strategies to the LLM query. Token space is not an issue anymore, yet I hold to certain limits to avoid drift.
On average my context lengths vary between 10-20K.
Perhaps with reranking, I could reduce it to half but why to bother? LLM is good at sorting out what is relevant and using it. I don’t have controversial documents - they all complement each other in the “understanding” of the broader picture.
5. Forming the Final Answer and Grounding It
I strongly doubt that validating the generated response against the same context proves the correctness of the answer. Just think about getting biased assessments within the same resonance bubble.
What could be alternative grounding strategies? (perhaps this question alone is worth a separate thread
Yet, the initial issue is still open and I believe it is quite technical.
Let’s define it as follows:
LLM goes into loops processing structured output inquiry and miscalculates necessary output tokens (probably) adding to the counter those used for calculating/preparing the answer, not the answer itself.
Can we resolve this one?
p.s. ATM I’m handling these exceptions and resolving them with simple retry logic, and not looking for workarounds.
I highly appreciate your time and efforts to answer my issue.