Chatgpt3 + langchain + custom data + personality Chatbot

Hello everyone, I’m opening this topic because I couldn’t find exactly what I was looking for. I looked at similar topics, but I couldn’t find information that was exactly what I wanted. Some of the topics were very old. If I need to give a few examples:


this is i want actually basic of this:

Regarding what I want to do; I want to do a two-step process.

  1. In simple terms, I want to upload the data I have and ask questions from this data.

1- I want to create a chatbot. To use in seller-customer conversations. 2- I want to use custom data. For example, I want to upload customer and seller conversations I have as question-answer pairs and also create and upload custom questions myself. I know the required format for fine-tuning, but how should I do embedding? Understanding which question corresponds to which answer, etc. Example: Customer: is x company selling cars? Bot: Yes, x company is selling sports cars. Etc. But when an unrelated question is asked, such as “What’s the weather like today?” or if it’s not in the data, I want it to respond with the answer I have determined.

3- I also want to fine-tune additionally. How can I do this? To focus on a specific area?

4- I want to add a feedback system as well. For example: Seller: Do you want to buy x product? Customer: Yes, I’m buying this product.

How can I achieve a successful sale or something similar to this? Can I include it as a parameter while fine-tuning?

In my research, I have seen many different options, but I couldn’t determine a definite roadmap.

Coming to what I know for sure;

1- A vector database is required for custom data. 2- I need to split my PDFs, texts, etc., into small pieces and convert them into text. 3- Fine-tuning doesn’t add data; I need to do embedding to add data. 4- I couldn’t find much about the feedback system and personality.

One thing that puzzles me is whether I should upload files first with embedding and then do fine-tuning?

For splitting, it seems like using Langchaing is the most logical option as far as I can see, but I’m not sure.

It’s a bit long, but thank you in advance for your answers.