I am trying to integrate ChatGPT as a Chatbot into my webpage, so that it can reply to content based questions.
I am facing the following problem:
Users send multiple messages for one question.
Example:
User: Hello
User: i have a question
User: about
…
What happends is that ChatGPT replys each message.
Does anyone have an idea how to sole this problem, so that ChatGPT somehow receives all messages within a periode of time, for example 2 minutes, and replys to all the messages within a single message?
One non-openai solution to this might be to use instruction tuned embedding models.
Ask a modern embedding model (such as NV-embed-v2, for example) whether it’s a complete request, or whether a response is or could be warranted, or whether it looks like the user might still be typing.
You could also consider asking 4o-mini to output just one token that you then parse programmatically that decides whether to generate or not.
Finally, you could wait 1.5 to 2 seconds after message sent - if the user starts typing in that time window, simply don’t send the request.
Change the interface. That’s a terrible setup. Relying on timing is going to be very flaky and introduce delay! Maybe add a “Send” button?
btw, the API is not ChatGPT (it’s a related but separate product) and it would be against the terms of service to integrate “ChatGPT” into another website.
Do you think that this way could help me to solve the same problem that i am facing to automate my Facebook Page messages?
Somehow it is the same problem and i was able to add a delay, befor accumulating all meesages and respond then on all messages within the same one, but the problem is that the same reply will be send as many times as the user send messages at the beginning.
Ok so if this is what exactly you want, you should have some logic embedded before creating a run for each message provided you are using Assistant’s API. The logic can check if the message is within a specific time frame or not, if yes append it to a list of user messages on the thread else list all messages on thread and for each message create a run on a new thread and append the outputs in User/Assistant response format and display. Idk what the use case might be, it would be good if you could disable the send button until you receive a response from LLM, to prevent users from spamming messages. Have a better design to handle multiple requests from different users to prevent race conditions and end up getting stuck to see weird responses lol. Hope this helps. Cheers
Do you have an idea what to do if i want to do the integration on my Facebook Page, so that Users get an accumulated reply to multiple messages? Adding a limit in this case should be at least much more difficult, if possible
Do you have any idea, if this will work too, if i want to do it with my Facebook Page Messages instead of a Chat on my webpage? I am facing same problems with users and multiple messages on my Facebook Page.
I have tried to automate it with Zapier, but everytime a user sends a message, it triggers a Zap and each Zap receives a reply. I was able to accumulate the messages and let the API generate a correct reply to all messages in one single message, but the problem is, that it will send the same message multiple times (as many times as messages received from the user at the beginning)
From what I remember, Facebook allows either posting the new messages to a webhook or notifying a webhook on new messages.
The logic would be:
Listen to webhook
On new message, identify the conversation, if the conversation has a running timeout (see #7), clear the timeout
Load all messages from it
Check last received message(s) (all user messages since the one saved in the pointer in #6)
Use AI to understand if the messages are complete and require an answer
If they are complete, create the answer and post it back after saving the last received message id into the conversation local pointer (last message seen id)
If the messages are not complete, set a timeout “reply to incomplete messages” on the conversation
End
If the “reply to incomplete message” timeout reaches the execution, load the messages (they are incomplete) and use the “incomplete messages reply” model (ask user the question if they are still here, or other similar behavior), then either setup another timeout “no user answer” or simply wait till they manifest (in this case it will trigger the webhook). Then end.
If using the “no user answer” timeout and it reaches the execution, send something like “hey, not sure if you’re here, let me know the rest of your message”… Then end. If they manifest, they’ll get into the webhook logic.
I would argue: KISS. Hold a buffer for a couple seconds after each message before churning. I run a Whatsapp bot and this has worked great for me. I am a notorious “small multi-text sender” so it was one of my first challenges
A part of me wants to follow the typical “locking” mechanisms of Assistants, but another part of me is full send into the asynchronous harmonization of RealTime API. Me no likey blocking threads.
Nothing more really than what I’ve said. Whatsapp is a difficult one, especially with serverless architecture as it doesn’t maintain any sort of state.
I’d rather wait out a buffer instead of have lightning-fast responses. Arguably, humans don’t usually respond as fast either. If for whatever reason they decide to respond yet again during processing the work is discarded and the process restarts.
I’d bet that using embeddings could help save some jobs (For example if a user says something like “Uhhh, are you there?” vs “Oh, wait, actually I want it in blue!”), but this is not something I implemented.