Currently, we can interact with users through the widget and response, and we can access the user’s input through the input schema. However, we do not know the final response generated by ChatGPT.
Our understanding is that ChatGPT generates its reply based on the user’s question and the response returned by our App. Based on this, we have two questions:
1. How can we reduce ChatGPT’s response latency? Can the widget render first?
When ChatGPT generates a response, it usually spends some time thinking, but the waiting time is inconsistent. Sometimes it takes only 1–2 seconds, while other times it takes 20–30 seconds. A 20–30 second delay creates a poor user experience.
We have already limited the length of our response because we assume that the more content ChatGPT needs to read, the longer it may take to generate a reply. However, even with the same user question and the same response, the response time is still inconsistent. Sometimes it is short, and sometimes it is much longer.
Is there any way to reduce this waiting time? Are there any recommended best practices?
In addition, the widget currently has to wait until ChatGPT starts responding before it can render, which makes the entire App experience feel slow. Is it possible to render the widget first and then let ChatGPT continue generating and streaming its response?
2. Is there a way to access the response generated by ChatGPT?
We would like to understand how users react to ChatGPT’s response. Is there a way for us to access the final response generated by ChatGPT? Is there an API for this?
At the moment, because we do not know exactly what ChatGPT generated, there is no complete feedback loop. We also cannot accurately understand the user’s reaction to the response. For example, if a user is dissatisfied with an answer, we do not know the exact response they received, so we cannot use their reaction to improve the experience.