Speeding up Python API calls?

Is there any way to speed up the a sequence of completion API calls to the GPT3 models? I have a use case where the response of the completion API should be returned very quickly, but I need the full reponse, so streaming would not help here. Is the python API maybe reusing the HTTP connection or auth tokens accross multiple API calls? Is there maybe a workaround? I am also fine changing the python package files locally for a while. :slight_smile: Thanks!

The best way to speed up responses is to use a lesser model (eg ADA and Babbage)

The API takes time to come up with its responses. The better and bigger models have larger latency.
Some more complex responses have been know to take 10 or more seconds

However, the smaller models may not have the knowledge or be as capable/accurate at your task

ADA is really good for classification promtps - but not so good at factual writing

Thanks and that does make sense. However unfortunately I am creating a chatbot experience in my native language (Hungarian), and the smaller models are not that capable in these regards… So I guess I need to mitigate this.

I was thinking more on technical terms, e.g. having a wegsocket open that is not closed each time I send a request to the API. Maybe this would speed up the response at least to some extent.