pretty much how I deal with things is to build a token chunker.
The idea is you set your size limit per chunk so that if data comes back over your limits you want it to chunk it into manageable chunks. But one important thing is to ensure that each chunk is seperated at a valid point so not in the middle of the message.
you than can pass each chunk with user query against the chunk to summarize based on the query. you than process all chunks this way.
At the end you combine the summaries. than check the size again. if size is still too big you repeat until it gets down to manageable size.
This works really well for contextual data. Do not use this method with Embeddings as embeddings are predefined patterns and altering them will cause your algorithms to calculate incorrectly. So its best to use your embeddings to narrow down the data to which information than pull the context and than do chunk against size.
If you do not mind a step further, how do you deal with non-standard Q&A questions:
From does my company support XYZ to
Summarise whole document
Top 5 takeaways
Please give me a list of all addresses listed in the document
How many times does the phrase “XXX ZZZ” appear in the document?
How to know? Should you run a similarity query (I use Azure search) and get the top 5 chanks, VS
Questions 1 and 2 - The summarised version of the document should be RAG
Question 4: open the original document and do a text search.
Question 3 - what would be the solution for this UC - summarised version will not work as maybe some of the addresses are omitted;- can you do a text search as you do not know what is address by simple text search ?
what do you mean non standard haha, a question/ query is a query. All questions are standard.
you have to build ai logic stacks to handle various types of questions. For data input though that is the key, your data storage design with meta data when you input you do your embeddings, and context for each chunk seeing you are using documents. you could for instance build chapter summaries to utilize in your stack for manuals etc. where you use an intent logic for messages to determine should I use this logic or that logic path based on my understanding of what the user is asking with their query on the data.
Hope that helps. Its not a simple response without a lot of code.
ps. I highly recommend taking some online courses for neo4j, I can’t say enough about it. they now offer their own Ai server stack to help get people started. although its very simple design that they have to learn from it would get you started for what you want.
myself I do not use any of their stuff but I did download it to look through all the code to see how they did it. that is another way to gain valuable insights is looking at open source code and talking to ai’s about about.
It helps though that I have a large background in I.T. , Automation, Electrical mechanical (sensors etc), Generative Ai, Databases, etc… so I can play as a full stack dev to take it all the way into robotics when that day comes. being a one man though with Ai as my only other work partner is a lot of work. I put in 8 hours in my day job programming and the likes, and 9 hours a night on ai learning and building so this is a life project for me, and I have that focus and drive so money is not a limiter for developing. its like a game for me haha.