I am currently developing a desktop application for authors, editors, and translators. The prototype has been successfully developed and is ready for launch, but we have encountered significant issues with the API when working with large text contents.
We have been dividing these large contents into chunks sized according to the maximum token limit of the model, and then calling CreateCompletion with the maximum token limit of the model. This approach has been successful when testing with smaller text samples. However, upon scaling up to text chunks at the maximum token limit, we have been experiencing issues. Instead of receiving a response, we encounter an HTTP timeout error. Reducing the token size again to a much smaller number is rectifying the problem, and we do receive a response. However, this solution is not feasible for our application given the nature of the contents we deal with. We are eager to understand and resolve this problem and would greatly appreciate your guidance.