We are using the embeddings API to answer a question and the response is taking upwards of 20 seconds on average. Is there a way to speed this up?
Which part of your QnA is taking time? Embedding the user query? Searching for answer against the stored vector data? Final chat completions API call?
1 Like
We have the same issue with voice response. Averaging 15-17 seconds for a response. No user will tolerate that. Would love to hear from others what they have figured out to speed this up.