I’m currently working on a strategy for building a customer support chatbot using OpenAI API (GPT 3.5 Turbo), and I would love to get your feedback on whether I’m heading in the right direction.
I think that integrating OpenAI API with a database containing question/answer pairs about our business and semantically searching it using tools like Langchain - then inserting the answer into the API prompt context is the best way to achieve our goal. Specifically, I plan to feed around 200-500 question/answer pairs into our knowledge base and then use semantic search to find the perfect answer every time a customer asks a question.
To make this happen, I think we need to build:
Integration of OpenAI API (GPT 3.5 Turbo) with our knowledge base using a semantically searched database and a prompt that’s engineered to be effective but low in token usage.
A visual interface for the database that will allow us to edit the knowledge base as and when needed. I’m hoping to use open source pre-built code to make it easier for everyone involved.
A user-facing chatbot website widget that will connect to the API. Once again, I’d like to utilize open source code to make this process as simple and streamlined as possible.
I would be so grateful to hear your thoughts on my approach. Do you think that this is the right way to build a customer support chatbot? If you have any alternative approaches or suggestions, I’m all ears!
I think with the current specifications your project will be difficult. There is (still) no possibility to train GPT 3.5 Turbo. Accordingly, you have to transfer all the questions you want to match with your answer to the model with every request. I have made the mistake in the past to not include this in my conceptual planning, so I hope I can help you by sharing my prior mistakes
If you find a low-token solution I would suggest a backend iterative procedure (added advantage that you do not have to build seperate solutions outside of the API):
you train a low token model to perform the matching between user question and stored answer. As a result of this process step you get your answer. This answer is NOT displayed to the user.
You transmit the user’s query and the determined answer to a model with higher costs. You submit the information with a request to answer the user’s question with the information provided from 1. You address any unanswered components of the question with a reference to your service desk.
I hope this helps you in your journey and all the luck in your undertaking,
Thank you for sharing your insights and for the suggestions on how to make the project work. I appreciate your advice on the low-token solution and backend iterative procedure.
I was also wondering if we could potentially use embedding with ada to search the database for the most matching question/answer pairing and make the answer part of the prompt that is sent to GPT 3.5 API as context. This way, we could avoid transferring all the questions we want to match with our answer to the model with every request.
Do you think this could be a viable solution? I would love to hear your thoughts on this. Also, I am not sure if having a huge database of 500+ pairings to search through using embedding will increase token usage - in theory not because only inputs and outputs use tokens right?
It’s definitely the way to go. If you’re doing semantic search via embeddings, you will not send a lot of tokens to the “Chat” endpoint because you’ll only send those QAs pairs that are relevant to answer each question. There are a couple of refinements that you might want to consider:
Include classifiers in the pipeline. They help you detect off-topic questions, modify your “system” prompt depending on the user’s intention (Creative, Factual, etc.) and many other things.
I don’t know if you guys give customer support via specific communication channels (Slack, Discord, etc.) If so, it feels more natural to have the chatbot already integrated in these tools rather than a website for the users to interact with it. But that’s just personal preference.
Thank you for your valuable input, @AgusPG ! I’m interested in your suggestion to use semantic search via embeddings to minimize the number of tokens sent to the “Chat” endpoint. I’m wondering if you would recommend using the OpenAI API (Ada model) for this embedding process or if you have other solutions in mind to save tokens. I’m also curious about the token billing process for embedding as my understanding (quoted below) is that it could become quite expensive. I appreciate any additional insight you can provide on this topic!
Assuming each item in the database has an average length of 500 words, generating embeddings for each item would consume approximately 500 tokens per item. Therefore, generating embeddings for a database with 500 entries would consume approximately 250,000 tokens (500 tokens per item x 500 items).
The token consumption required for comparing the embeddings would depend on the number of embeddings being compared and the size of the embeddings. Assuming each embedding is 512-dimensional and we are comparing all 500 embeddings with a query, the token consumption for comparing the embeddings would be approximately:
512 (dimensionality of the embeddings) x 500 (number of embeddings being compared) = 256,000 tokens.
Therefore, the total token consumption required to generate embeddings and compare them for a database with 500 entries would be approximately 506,000 tokens (250,000 tokens for generating embeddings + 256,000 tokens for comparing embeddings).
You do not use tokens for the “comparison”. Your token consumption is one-shot: you just need to embed all your QAs pairs (usually called knowledge base) at the beginning of the process (and any new QA pair that you might want to add in the future). Once that’s done, your only additional token consumption comes when you need to embed every new question that your chatbot receives. Also: the dimension of the embeddings does not play any role here. You pay per tokens embedded, not per the resulting dimension. The cost is ridiculously low . Ballpark estimation, using your numbers:
250,000 tokens at the beginning (embedding the whole knowledge base).
I’m messing around with the idea that I can use a transformer, and cosine similarity to match a prompt with a sentence in a text file. I created a narrative.txt file that is parsed and a high quality match is sent along with the current message as context. It works pretty well. If I ask it what my name is it responds correctly. What I have is hackish, I’m not a great dev, most of my comments are apologies, but it might help. Bitbucket
Still, if I run the model you used and say we have 1500 chat starts per month, using Ada’s pricing of $0.0004 per 1k tokens, the 250k token starting prompt still costs $0.1. Which means our bot would cost a minimum of $150 per month which is not insignificant.
How are companies like quickchat.ai able to charge only $99 per 1000 messages when they would also need to import the knowledge base at the beginning of each prompt (custom brand specific)?
I’m just trying to understand this, thanks for bearing with me.
Np man, don’t worry! Very happy to help.
You don’t need to re-embed the knowledge base every time a new chat starts. The knowledge base is already embedded and stored in your DB. Every new chat uses the same knowledge base: the information source is the same one. So, let’s say that you get 1500 chats per month, with 10 questions each. These equals 15,000 questions per customer. Your cost would be:
250,000 tokens in embedding the knowledge base: one-shot. Once this is done, you never need to go through this process again. Let’s assume that you re-embed your knowledge base every month, for simplicity.
1,500 x 10 x 500 = 7,500,000 in embedding all the questions that you get monthly from this particular customer (10 x 1,500).
Overall month cost per customer = 7,750,000 tokens. That equals 3$ per month in embeddings, to match 15,000 questions. Definitely affordable if you’re charging them 100$/month . Obviously you need to add the cost of the “Chat” endpoint. That’s the expensive part. But the embedding cost is almost zero.
One last question- you mentioned the biggest cost is in the chat endpoint. Are you saying there is another cost in addition to the $3 you illustrated, or do you mean that most of the $3 is made up of the chat endpoint since the questions cost tokens? Thank you in advance! You’ve really helped me get a grasp of this topic.
Of course, any time! Happy to help
Oh yeah, there is another cost. That’s our big guy here, as the “Chat” tokens cost 0.002/1k tokens. Once the semantic search part is completed (retrieving the QA pairs that are relevant for the given query), you’ll pass these pairs to the “Chat” endpoint, together with the question and some instructions, to get an answer out of it. That’s the costly part. Again, some quick maths using the example of the 15,000 questions per month and assuming that each one of the calls uses 2k “Chat” tokens:
15,000 x 2,000 / 1,000 x 0.002 = 60$/month.
So this is your highest cost. Still profitable given some prices that I’ve seen here and there from some of these “chatbot-as-a-service” companies
Okay that makes sense. Still, the cost for that is much cheaper than using a chatbot service that just has a pretty interface layered on top, as you said - let alone hiring a team of 4 to provide 24/7 live chat support.
One other interesting thing I’ve been thinking about as to how solve it is what to do with answers that require multiple steps. For example a customer asks “how do I request a refund for X” and the bot then needs to collect their email, then their order number and finally a reason for refund request. Do you have any thoughts on how you can get the AI to ask those series of questions with multiple inputs and how to make it stay on track if the user replies something unexpected?
I’ve been thinking a lot about that but haven’t come to a reasonable solution yet…
That’s definitely the tricky part hahaha. There is a lot of research going on in this area. As a starting point, I’d definitely recommend exploring langchain. They offer a wrapper to a lot of these “decision-making” components such as searching the web, using a calculator, etc., by abstracting this decision-making process in an “agent” that needs to decide which set of “tools” should it use (and in which order) to answer in a proper way. You can integrate your own tools into the set of decisions that the agent might want to consider within this sequence.
In my view, it is probably not sufficient when you need to incorporate complex decision-making pipelines, but I know people that love it even in that scenario. In any case, I do believe that it’s the best way to quickstart in just a couple of lines of code.
I believe the most difficult part of making anything “ultimate” with GPT is that the technology is changing so much that any sort of process could be lost or made redundant within a week.
Information retrieval and its synergy with GPT is going to be a huge player. You may find that semantic searches by themselves may not return the best results or even be necessary for a lot of knowledge base lookups. For things such as: keyword queries, negations. Even sarcasm, the most dreaded literary device online, can ruin a semantic search.
A simple example for a product search using only semantics would be:
“Hi, I’m looking for a toy for my child. He doesn’t like Batman” Shows Batman
“No, not batman!!!” Shows even more Batman
The visual aspect of a knowledge base is nice, I considered the same myself in the concept of “cartridges” (Product is mentioned, apply a cartridge with its name to the chat window to indicate that the chatbot is now loaded with its information). However ultimately after some testing it was determined to be too confusing and was scrapped. I’m sure there’s a way to make it all work together nicely though.
Abhi, to get help it is important to ask the question directly. You have redirected me to a topic with hundreds of replies.
Sorry for asking, but do you expect me to read and search for your question and then answer it?
A better approach is simply to ask the question giving full context. Thanks