How Can We Reduce ChatGPT Response Latency and Access the Generated Response?

Currently, we can interact with users through the widget and response, and we can access the user’s input through the input schema. However, we do not know the final response generated by ChatGPT.

Our understanding is that ChatGPT generates its reply based on the user’s question and the response returned by our App. Based on this, we have two questions:

1. How can we reduce ChatGPT’s response latency? Can the widget render first?

When ChatGPT generates a response, it usually spends some time thinking, but the waiting time is inconsistent. Sometimes it takes only 1–2 seconds, while other times it takes 20–30 seconds. A 20–30 second delay creates a poor user experience.

We have already limited the length of our response because we assume that the more content ChatGPT needs to read, the longer it may take to generate a reply. However, even with the same user question and the same response, the response time is still inconsistent. Sometimes it is short, and sometimes it is much longer.

Is there any way to reduce this waiting time? Are there any recommended best practices?

In addition, the widget currently has to wait until ChatGPT starts responding before it can render, which makes the entire App experience feel slow. Is it possible to render the widget first and then let ChatGPT continue generating and streaming its response?

2. Is there a way to access the response generated by ChatGPT?

We would like to understand how users react to ChatGPT’s response. Is there a way for us to access the final response generated by ChatGPT? Is there an API for this?

At the moment, because we do not know exactly what ChatGPT generated, there is no complete feedback loop. We also cannot accurately understand the user’s reaction to the response. For example, if a user is dissatisfied with an answer, we do not know the exact response they received, so we cannot use their reaction to improve the experience.

Are you saying that you want to monitor your user’s ChatGPT responses?

Not exactly. We are not trying to monitor all of a user’s ChatGPT conversations. We want to understand the specific response ChatGPT generates based on the user’s question and our app’s response, because that response directly affects how the user interacts with our widget.

For example, we have a property-search app. ChatGPT may mention the address of a property in its response. The user may think, “This property looks great,” and then try to find that property in our widget using the address. However, our widget may not display the address, so the user cannot identify the property or understand how to continue.

Many users do not realize that they can ask ChatGPT another question to resolve the issue. Instead, they get stuck and leave.

The core problem is the information gap between our app and the user: the user knows exactly what ChatGPT said, but we do not. Because we cannot see the exact response generated by ChatGPT, it is difficult for us to understand why the user became stuck and to optimize the widget experience accordingly.

Have you considered providing a catalogue of properties, in am way? If I understand correctly this is through the Apps for ChatGPT Right? Try looking for connectors to provide your domain instead of intercepting the chats.

Unfortunately you will never know unless your users specifically communicate to you the issues. Anything else would be a huge privacy and security violation.

Thanks for raising. You present a definite limitation.

Do you also happen to use the same tool for a more local chatbot which you control? Perhaps that might help you tune things more locally.