What is considered as normal latency?

In my experience, it is very varied in terms of response latency. Depending on the time of the day and the traffic, my query returns usually vary between 24 - 87 sec (I have a pipeline with multiple calls to GPT involved, usually with tenacity used to prevent RateLimitError).

While it is definitely not fit for high speed production as it is, maybe with foundry being available in the near future, that might solve the problem.

1 Like