How to get a quick response from chat via API?

Koto · December 3, 2024, 4:39pm

Asking a question using the API I usually have to wait a long time for an answer, like 25-30 seconds, which is much longer than through www services. When I turn on streaming responses, it seems very inefficient, because the stream comes very often, every few characters (one token?), but I found that by limiting max_tokens to a small number, e.g. 150, I get a quick answer, but unfortunately the answer is cut off. Setting other values like 1000 or 2000 can also be cut off and you have to wait longer. Would there be something in between streaming and regular: e.g. every 100 tokens I would receive a partial answer, which would be continued?

Topic		Replies	Views
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	3	3092	May 24, 2023
ChatGPT API Very Slow at generating Responses API gpt-4 , api	7	5966	June 14, 2023
Assistant API - Speed and Token Limit API gpt-4 , assistants-api	2	509	May 1, 2024
HTTP Calls Excessive Delay Waiting for Server Response API gpt-4	8	1375	January 9, 2024
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? API gpt-35-turbo , api-speed	6	6717	May 21, 2023

How to get a quick response from chat via API?

Related topics