Sales Chatbot that read all your pages. How does it work?

luizf9 · May 15, 2023, 1:15pm

Hello everyone, how are you?

I’m here to ask for your help. I’m curious about how a certain app that I use was developed. Could you help me understand the steps that may have been taken? I think it would be interesting to know a little more about how these things work.

Basically, this app creates a chatbot that helps the customer in an ecommerce.

It reads all the pages on a site and then generates the chat that can be implemented with a script.

The app is https://zipchat.ai

You can see it working here:

Certainly embedding was used, am I right?

Thank you in advance for your help!

RonaldGRuckus · May 15, 2023, 3:55pm

The immediate answer would be: a hybrid vector database index

Embeddings are a great start, but fail when it comes to connections and relationships. In my experience, they seem to work perfectly when attempting with single-shot message such as “What is the biggest product? What is product X?”, but, delving deeper into the chaotic mess of a conversation, foundational cracks appear. For this reason. I would say that embeddings work wonders for search engine results, but are not sufficient by themselves for conversational agent information retrieval.

When considering using a chatbot for e-commerce, you should really take in considertion 重新排名 | Machine Learning | Google Developers Otherwise, you are just an over-complicated and possibly incorrect knowledge base Q&A bot. In most cases, a keyword-based search can return better results than semantics when it comes to product information retrieval if one had to pick between one or the other. So, a huge question is: “What makes a chat service better for e-commerce customer support?” Well, in my experience, speaking with a human is amazing because they had these intricate relations and connections already. So I can say something like “I have a bad knee and want this product, but it’s pretty heavy, what’s something similar?”, and a human can answer it (hopefully). A chat service using embeddings, cannot. In my opinion, a successful chat service must be able to match this level of recommendations. Answering simple questions such as “What is product X” won’t be as common as one would think, as this information is already retrievable by the customer using click, click

One part of the difficulty is managing token size when injecting context. Either you send every little piece of product information, and hope that A. It fits, and B. It doesn’t cause enough truncation to the conversation that extra context is lost. You can also try making your embedding results more specific and hope that overlapping / mixing doesn’t occur, or that it can even be used properly. I tried to accomplish this using a fine-tuned classifier model to determine the properties of the product that the user was requesting. It worked, but it wasn’t worth it as there’s still much larger, more important issues.

For example, if a user were to start their inquiry with a product question, and then 20 messages later ask for a comparison between the initial product, and a new product. Or even ask for a product that’s “more colorful”. Boom, hallucination, false data, and false product information is produced. Clearly some sort of “contextual stack” is needed.

This also hasn’t even bothered to ask the question: “How does my program, or GPT know when to use embeddings?” Or, are you simply going to request embeddings for every single user message?

Since you are trying to sell this as a service, it’s your responsibility to almost COMPLETELY eliminate domain-specific hallucinations ( ) that can cause inaccurate product information. For example, what if your chatbot tells a user that a product does something that it doesn’t do? And the user complains and demands a refund?

So, you’ll need to refine your question. What do you want to accomplish? What limitations are you willing to accept? Are embeddings the solution, or are they simply a component? How are you going to prepare your query for embeddings? How can you match each and every manner of asking basically the same with your embedded results? How will you capture nuances that change the product? How are you going to know that you even need to perform a similarity search? How will you follow the conversation, and the web-like contextual structure that it forms?

I’m sorry. I know the worst answer is one that over-complicates the question. But, these are my findings are lots and lots of trial and error. The hybrid vector index (imo) would be a great place to start, and then these other concerns can be addressed after

bill.french · May 15, 2023, 4:24pm

Yep - even an inverted index like LUNR is blistering fast. I have a future task to blend LUNR with an LLM. I think the combined search architecture could expose some useful experiences.

RonaldGRuckus · May 15, 2023, 4:26pm

I am on a similar path. Fingers crossed!

Topic		Replies	Views
How to fine tune a chatbot for Q&A API	12	7467	December 16, 2023
Finetuning model to learn from my product data API	13	2244	February 16, 2023
Reducing Cost of GPT 4 by using embeddings Prompting	23	8298	May 4, 2023
Questions about embeddings API	1	862	October 16, 2023
Using GPT to Search & Pull Recommendations from a Database? API	22	5475	February 22, 2024

Sales Chatbot that read all your pages. How does it work?

Related Topics