Assistant API - product readiness

Hi,

I reviewed the Assistant API docs and challenged it with a small Reminder UseCase to see if its ready for real world applications(even small ones).
I understand and support the intention to lower the barrier for real world applications.
But currently (prove me wrong) I think its NOT yet there.

Here some thoughts on things I think could help for its way to a Product-ready framework.
To my understanding the Thread concept intends to implement the chat memory.
BUT since its limited to the max models context it seems to be a ConversationBufferMemory (sliding window) that can cover short to mid term memory but definitely will start to forget in the long term within real world scenarios. (Unless some magic was implemented I didn’t find in the docs)
Maybe OK cause we could use the RAG API for longterm memory… BUT:

  • we’d have to automate the extraction of the thread memory before it reaches the context limit(needs regular token counting) and translate it to a file that we upload to the RAG system. This seem as tedious as implementing the Memory myself. <= Maybe a first feature request for the Assistant API to have the capability of real long-term memory
  • with the limitation to 100GB per Organisation you’ll not survive long in a real world scenario. <= I need to know that I can scale. So eather remove the limitation for production workloads or allow integration with existing RAG storage systems. (increase upon request is only feasible for prototyping, not for production)
  • For any kind of RAG Storage in real world applications you’ll end up with the need to:
    • extend and change the indexed information (while the world and user preferences constantly change)
    • also have the posibility to create a structured information storage and retrieval ( as we know transformers and similarity search is not well suited for counting things )
    • optimize the retrieval (& indexing) strategy per use case if you want to reach actionable accuracy levels.
      <= feature request would be a RAG API that offers:
      • record based(not file based) data integration (ideally allow external data source integration)
      • scalable (no storage limit + volume independent response times)
      • structured information storage attached to the indexed unstructured input ( maybe as knowledge graph + vector storage )
      • the indexing and the retrieval should allow callbacks or configuration of the strategy used for chunking and similarity per use case. => this means it might be necessary to setup different RAG storages and maybe use the Tools Functions to automatically decide which Message/Data to go to which RAG storage.

…stopping here and hope for a vivid discussion.

Thanks & BR Adrian

1 Like