Fine Tuning a Chatbot to provide answers from a specific dataset

ayangupta · July 18, 2021, 2:34am

Hi there! I just got access to the beta API and am excited to start tinkering with this technology! One thing I was wondering is if it is possible to train a Q&A chatbot to answer from a specific set of articles (200+).

Currently, there are details for training a customer support chatbot here but that seems to take in data regarding past chat history from users. For someone who does not have thousands of lines of past conversations, this doesn’t seem very practical so I’m hoping to provide GPT-3 more details specific to a certain area (e.g. economics) in the form of plain text/articles.

Guidance on this is greatly appreciated! Just getting started so I might have missed this functionality if it already exists.

ayangupta · July 18, 2021, 3:36am

In this case, @m-a.schenk I believe the student would be the only player. They would ask questions to the chatbot regarding questions in a specific domain. For example, the economics chatbot will provide answers to questions students might ask such as “what are the limitations of the X model vs the Y model”

ayangupta · July 18, 2021, 4:19am

Got it, makes sense! Thanks for the clarification.

What is your take on this? Is it better to fine tune using the Q&A format they have provided or would it be better to fine tune using a large set of articles and other text relating to the subject matter? Is there a resource I can refer to that discusses fine tuning using just large amounts of text?

daveshapautomator · July 18, 2021, 11:27am

I’m using raw material (like KB articles) and GPT-3 to synthesize data. Also there’s a fantastic dataset from Stack Exchange. I searched Kaggle Datasets to find tons of data. The short version is that you find lots of data that is “close enough” to bootstrap your project and then you use GPT-3 to synthesize the rest of the fine-tuning dataset.

stevet · July 22, 2021, 7:48pm

Hi @ayangupta I just published some code and a tutorial for doing what you’re asking about. I used the answers endpoint and a documents file as the source for the answers. It’s a complete app that you can get up and running in a few clicks. Also, all of the code is there for you to look at / use / modify / whatever. Here is the tutorial video. I hope this is helpful.

Tetramatrix · August 14, 2023, 7:58am

I wonder if this is still working? How are the results? Can it be compared to embeddings?

Topic		Replies	Views
Creating a conversational chat bot with a large data set API	4	3307	March 2, 2023
Q&A based on docs and pre-defined answers Prompting	1	852	February 4, 2022
Fine tune a chatbot to provide answers specific dataset Community embeddings , chatgpt	4	860	December 17, 2023
Is it possible to fine-tune a model to answer questions given a raw text? Prompting	18	10232	December 15, 2023
Fine-Tuning with Non-Prompt/Completion Data: Seeking Advice for Direct Text-Based Training? API gpt-4 , chatgpt , fine-tuning , api	3	431	August 23, 2024

Fine Tuning a Chatbot to provide answers from a specific dataset

Related topics