I reviewed the Assistant API docs and challenged it with a small Reminder UseCase to see if its ready for real world applications(even small ones).
I understand and support the intention to lower the barrier for real world applications.
But currently (prove me wrong) I think its NOT yet there.
Here some thoughts on things I think could help for its way to a Product-ready framework.
To my understanding the Thread concept intends to implement the chat memory.
BUT since its limited to the max models context it seems to be a ConversationBufferMemory (sliding window) that can cover short to mid term memory but definitely will start to forget in the long term within real world scenarios. (Unless some magic was implemented I didn’t find in the docs)
Maybe OK cause we could use the RAG API for longterm memory… BUT:
- we’d have to automate the extraction of the thread memory before it reaches the context limit(needs regular token counting) and translate it to a file that we upload to the RAG system. This seem as tedious as implementing the Memory myself. <= Maybe a first feature request for the Assistant API to have the capability of real long-term memory
- with the limitation to 100GB per Organisation you’ll not survive long in a real world scenario. <= I need to know that I can scale. So eather remove the limitation for production workloads or allow integration with existing RAG storage systems. (increase upon request is only feasible for prototyping, not for production)
- For any kind of RAG Storage in real world applications you’ll end up with the need to:
- extend and change the indexed information (while the world and user preferences constantly change)
- also have the posibility to create a structured information storage and retrieval ( as we know transformers and similarity search is not well suited for counting things )
- optimize the retrieval (& indexing) strategy per use case if you want to reach actionable accuracy levels.
<= feature request would be a RAG API that offers:
- record based(not file based) data integration (ideally allow external data source integration)
- scalable (no storage limit + volume independent response times)
- structured information storage attached to the indexed unstructured input ( maybe as knowledge graph + vector storage )
- the indexing and the retrieval should allow callbacks or configuration of the strategy used for chunking and similarity per use case. => this means it might be necessary to setup different RAG storages and maybe use the Tools Functions to automatically decide which Message/Data to go to which RAG storage.
…stopping here and hope for a vivid discussion.
Thanks & BR Adrian