I build an app using assistant api with vector store attached to the assistant. If I ask a question via playground, the assistant is able to retrieve data from the vector store (via embedding). But if I use the API call, it is not using the vector store to retrieve relevant data.
Remember, the same assistant works on playground, but not via API. Any idea why this may be happening??
Seeing that this is a known issue without any solution at the moment, I’ve decided to mix manual embedding with assistant api and the result is excellent. Token usage is minimised from average of 10k (on playground with vector store) to just 1k-2k with embedding.
Here’s my workflow if anyone is interested
From Admin Dashboard
Admin select file(s) to upload
Files are uploaded to web server storage and also upload to OpenAI files API
OpenAI file ID is attached to the vector store, and also saved to local DB
Fired a queue worker to extract file content from my local file storage on a page by page basis
create an embedding of the pages via embedding api (ensuring that there is an overlap across pages)
save embeddng to local database table with the text
On the Chat Interface
User asks a question
an embedding is created for the question
vector search is carried out on my local embedding store and relevant context are retrieved (with distance < 0.6)
The context (if found) and questions are added to the thread
Thread is executed and result is retirieved
** Not much difference from manual embedding **
I know this process is not that much different from the old approach of using vector db to manually create embedding, however, it serves the following purposes:
For now, it is a temporary solution to the vector store issue, once this issue is fixed, I can disable the module for doing the manual embedding
Unlike with completion api, I don’t have to worry about sending the conversation history since that is already part of the assistant api thread. I only need to include the new questions and any relevant context retrieved via embedding
Token usage is much better and controllable for now. Again, once the vector store issue is sorted, I can rely on openAI to retrieve the relevant context