Asking a question using the API I usually have to wait a long time for an answer, like 25-30 seconds, which is much longer than through www services. When I turn on streaming responses, it seems very inefficient, because the stream comes very often, every few characters (one token?), but I found that by limiting max_tokens to a small number, e.g. 150, I get a quick answer, but unfortunately the answer is cut off. Setting other values like 1000 or 2000 can also be cut off and you have to wait longer. Would there be something in between streaming and regular: e.g. every 100 tokens I would receive a partial answer, which would be continued?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI | 4 | 2918 | December 24, 2023 | |
ChatGPT API Very Slow at generating Responses | 8 | 5024 | December 25, 2023 | |
HTTP Calls Excessive Delay Waiting for Server Response | 8 | 1027 | January 9, 2024 | |
How can I improve response times from the OpenAI API while generating responses based on our knowledge base? | 3 | 19873 | November 9, 2023 | |
Chatgpt-3.5 turbo model takes long time to respond. Is there any way to speed this up? | 7 | 6495 | December 19, 2023 |