Best architecture for a engaging chat bot? (with dataset)

TLDR: I have an idea for a selling chat bot, I’m sharing the architecture I have on my mind, and my doubts. Any thoughts on any part are super welcome!

So I have a dataset of over 5 years of conversations. These conversations are from sellers reaching out possible clients, via email/chat. I am able to filter successful vs unsuccessful conversations.

I’ve been reading a lot, and also asking GPT-4 technical questions, but I am not able to decide which would be the best approach on the technical side.

Two most important things I want the model to learn from the dataset:

  • The way our sellers answer back, depending on the client’s answer, trying to keep a good conversation to get a good sale technique
  • The way our sellers chat, their style

What I have on mind:

1- Fine tuning a model with only the successfully conversations (ignore the unsuccessful).

Doubt 1: should I use the conversations that doesn’t get to a sale in any way?
Doubt 2: what base model to pick?


2- When preprocessing the conversation, use {{variables}} for avoiding feeding into model things like: names, private info, contextual stuff, etc.

Example:

Hello {{clientName}}, I’m reaching to you to talk about {{productName}} that will be released on {{releaseDate}}, thought you would be interested since you have experience in {{productFiels}} and successfully used {{productName}} before.

Doubt How should I programatically detect these variables in order to filter them out of the model?
Doubt 2: regaerding a variable like {{ productDetails }}, which is specific of the product, should I actually send this informatoin together in the prompt? so the bot can elaborate more on its own?


3- When feeding the model with prompt/completions from the dataset, each prompt will have the whole conversation except the last one (which will be provided on the completion). This would mean that for each new BOT answer, I’ll have all the above conversations again.

So depending on the length of messages that the conversation has, the more I will need to feed the model with repetitive text.

Example of prompts/completions to feed the model:

Prompt 1:

BOT: Hello {{clientName}}, are you interested in more info about {{productName}}?
CUSTOMER: Hi there, sure!
BOT:

Completion 1:

Glad to hear! so the best thing about this product is {{productFeatures}}, would you like to test it?

Prompt 2: (the prompt would repeat the above)

BOT: Hello {{clientName}}, are you interested in more info about {{productName}}?
CUSTOMER: Hi there, sure!
BOT: Glad to hear! so the best about this product is {{productFeatures}}, would you like to test it?
CUSTOMER: Not sure if really interested
BOT:

Completion 2:

Maybe you could be interested in a simpler one called {{productName}}, let me give you more information…

Doubt: I understand it is okay and needed to repeat the whole conversation each time. But, in this case this helps the fine tuned model to understand how the seller sells? or I need more product specific information somewhere?


4- I will use the final fine tuned model with prompts like:

clientName: “Mike”
produtName: “High speed Internet”
productDescription: “This internet is fast”
BOT: Hello {{clientName}}, are you interested in more info about {{productName}}?
CUSTOMER: Hi there, sure!
BOT:

Doubt: Is this the correct way to replace variables? or should I keep them out of the prompt and replace them manually when the bot answers?


5- Applying system based ruels to improve the end result, The idea is to detect patters that are known by the sellers, and handle the conversation in a different way. These would be manually implemented rules

Example: The client aswers back “Not interested”, we detect that and answer with a specific sentence.

Doubt: Should I be ok without this and just using the fine tune model?


5- Add synthetic conversations. Generate my own conversations to feed the model.

Doubt: may it help? to add ideal made up conversations following specific rules/paths?

1 Like

Hi @Ernnet - It’s an interesting idea & architecture. I just had a few thoughts that I wanted to share, and I am very much a beginner in this space myself so take this with a pinch of salt.

I think this can be done via Prompt Engineering and that is likely to be more effective. You need to probably add a layer before calling chatGPT API, that layer would essentially be used to construct the prompt. The prompt construction itself can be cleverly done and you may have to figure out how to do that.

I have been thinking of doing additional training as well (for my usecase) but the more I am reading, the sense I get is that prompts can achieve a lot. Not sure if anyone in the community has more experience in this area to shed some light.