I currently store All Assistants, Threads and messages in our database.
When using an Assistant it goes from 1 second to 10+ second even when streaming, when instruction over 8k and large amount of messages on a thread. I believe this is because they send all of the conversation and the instructions back for every response.
I’d like the AI to have history of previous messages, specific instructions, files attached to the AI but allow user to add files.
What is the best way to get fast responses with these features?
Can I store the previous messages and attach it as a file to the Assistant maybe?
Why doe OpenAI return so much data everytime and can we turn off everything but the message?
First, retrieve only last 2 messages, do you retrieve messages with default value of 20? params = {‘order’: ‘desc’, ‘limit’: ‘20’}
how i retrieve messages:
thank you for the suggestion but its the time from the request to the first delta delivery when streaming since the assistant sends all of the history with every request.
For example a new thread will return a message in 0.5 seconds but a thread with 10 message and 8k of text will take 20+ seconds, even if the current question is only a sentence.