My API response is prolonged (more than 1 minute for a run with assistants) whenever the assistant uses Retrieval. If Retrieval is not involved, everything is perfectly normal.
Has anyone encountered the same situation?
That’s about right. The subsequent requests will have better performance than the first one. My first request always takes about 45s, then subsequent requests would in in the range of 15-20s. I guess OpenAI will only lazily process the file (chunking + embedding) when the first retrieval of that file is needed.
Mine was consistent 30-40s for each request using the same thread id.