I have tried “Retrieval” tool from OpenAI Assistant API which is to slow. It takes (4 - 8 seconds) for a short prompt and response, (7 - 16 seconds) for a long prompt and response.
- Model: gpt-3.5-turbo-1106
- No. of files: 1 (.docx)
- File size: 23.3 KB
- No of pages in file: 10 pages (2993 words)
Is there something fundamental (like reading documents) that make assistants slower? Or is this just due to it being new? Or is there any way to speed it up?
It is because the assistant api is under development.
Yes you are right, it is still in beta version. Have you try anything to enhance its speed?
No I did not use anything to enhance the speed. But I was looking to use this assistant for generating quiz based on the pdf that I upload and release it as an api and it did not work properly
It retrieves information from the file you uploaded. That’s why it’s slow.
Fast way: extract document to plain text yourself. Include as a RAG assistant message after “system” or before user question. See a stream of chat completion response within a second.
Slow way: use another’s service that puts the decoding and access to information behind an embeddings or function. Don’t see anything until you see the response is status:done and then retrieve it.
One other thing to keep in mind is that streaming makes the Chat Completions API feel faster, so streaming being absent from the Assistants API is likely one contributing factor to it feeling slower.
Given your use case you might be better off using the regular chat completion API and passing along your document in each request. Your word document can fit into the context window for the chat completion.
You will have finer control over what is being sent into the context window as well as getting the instant streaming response.