Asking a question using the API I usually have to wait a long time for an answer, like 25-30 seconds, which is much longer than through www services. When I turn on streaming responses, it seems very inefficient, because the stream comes very often, every few characters (one token?), but I found that by limiting max_tokens to a small number, e.g. 150, I get a quick answer, but unfortunately the answer is cut off. Setting other values like 1000 or 2000 can also be cut off and you have to wait longer. Would there be something in between streaming and regular: e.g. every 100 tokens I would receive a partial answer, which would be continued?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI | 3 | 3092 | May 24, 2023 | |
| ChatGPT API Very Slow at generating Responses | 7 | 5966 | June 14, 2023 | |
| Assistant API - Speed and Token Limit | 2 | 509 | May 1, 2024 | |
| HTTP Calls Excessive Delay Waiting for Server Response | 8 | 1375 | January 9, 2024 | |
| Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? | 6 | 6717 | May 21, 2023 |