Fine tuning a model for customer service for our specific app

Hi, I’m brand new to OpenAI, so bear with my basic question!

I’m curious to know if I can train a custom model that can answer questions our customers repeatedly ask, and what would that JSONL file look like.

Would something like this work?

{"prompt": "Customer: Hey, how can I do XYZ using the ABC feature?\n\n###\n\nAgent: Hey there!\n\nJust click the Widgets link in the left nav, then do BBB followed by CCC, then DDD.\n\n###\n\nCustomer: Great, thanks! xoxo\n\n###\n\nAgent: ", "completion": "No problem, just give us a shout if you need anything else! END"}
{"prompt": "Customer: ARGH, I got an email that my card was declined. WHY? There's money in the account.\n\n###\n\nAgent: Hey, sorry about that - I know it can be frustrating. The card declined message comes directly from your bank, so you'll have to contact them to find out why your card isn't going through.\n\n###\n\nCustomer: Ok, fine. Thanks..bye.\n\n###\n\nAgent: ", "completion": "Let us know what they say, and we can manually rerun your card once it's all sorted out. Cheers! END"}

In other words, if I dump a thousand conversations like this into a fine-tuned davinci model, and I cover pretty much every question about every feature, would we be able to feed live customer input into OpenAI’s API and get actually helpful responses?

4 Likes

So, I asked ChatGPT how to go about doing this. :wink: I asked:

I want to use the fine tuning feature to create my own model. The goal is to teach OpenAI’s davinci model how to help answer live customer chat questions about our application. How should I format the customer support conversations to populate the prompt and completion fields in the JSONL file?

And it said:

To fine-tune a language model like OpenAI’s GPT-3 for the task of answering customer support questions, you will need to create a dataset that contains examples of customer support conversations. Each example should consist of a prompt and a completion. The prompt should contain the customer’s question or issue, and the completion should contain the response that the model should generate.

Here is an example of what a conversation might look like in the JSONL format:

{
  "prompt": "Customer: I'm having trouble logging into my account. Can you help me troubleshoot the issue?",
  "completion": "Agent: Of course! First, can you try resetting your password? If that doesn't work, it's possible there may be an issue with your account. Can you provide me with your account details so I can take a closer look?"
}

If this is correct, then it seems that we just need to feed pairs of these questions/answers into the fine tuning process?

1 Like

This is almost correct. You need to move the “Agent:” to the end of the prompt (with no space after it), and then you need to have a space at the start of the completion and follow it with the answer.

3 Likes

Gotcha, so:

{"prompt": "Customer: (question here)? Agent:", "completion": " (answer here)"}

Let’s say I create a fine-tuned model with thousands of lines like this, and I’m ready to use it in the real world as a chatbot. I assume it would work like this:

  1. Customer types “How do I do XYZ?”
  2. We make an API completion call with the prompt "Customer: How do I do XYZ? Agent:"
  3. We take the result of that call and pass it to the customer: "You do XYZ by doing ABC."
  4. Customer asks a follow-up question: “When I do ABC, I get an error.”
  5. We make another API call with the prompt of the entire conversation so far? E.g. prompt = "Customer: How do I do XYZ? Agent: You do XYZ by doing ABC. Customer: When I do ABC, I get an error. Agent:"
  6. (rinse and repeat)

Do I have this right? Is it necessary to include the entire chat history in the prompt?

This is correct. You could include the last 3 or 4 interactions (Just don’t run out of tokens)

For clarity reasons, put a double linefeed between each pair of interactions

1 Like

Great, thanks for all your help!

1 Like

Hi,

I have a follow up question to this conversation. In @ffxsam’s original question, the format of the data was like this, which is following the open ai docs section on the customer support chatbot.

{"prompt": "Customer: Hey, how can I do XYZ using the ABC feature?\n\n###\n\nAgent: Hey there!\n\nJust click the Widgets link in the left nav, then do BBB followed by CCC, then DDD.\n\n###\n\nCustomer: Great, thanks! xoxo\n\n###\n\nAgent: ", "completion": "No problem, just give us a shout if you need anything else! END"}

I am in a similar situation where I asked chat GPT how to properly format the prompt, and was given something similar to the second example:

{
  "prompt": "Customer: I'm having trouble logging into my account. Can you help me troubleshoot the issue?",
  "completion": "Agent: Of course! First, can you try resetting your password? If that doesn't work, it's possible there may be an issue with your account. Can you provide me with your account details so I can take a closer look?"
}

What is the the difference between the two formats? What is the reasoning to go with the second example, rather than the way that the docs formatted it with the stop sequences, etc?

Hello. I had a very similar question, and in all the internet, this post has been the closest I found to addressing the issue. So, I got an idea: Why not ask Chat GPT3? You may or may not wish to read the following discussion I had with AI on this subject, but I do highly suggest you try getting the answers you need FROM the AI itself!

how do i create a custom model for support questions on our website?

There are several steps involved in creating a custom model for support questions on a website:

  1. Collect and label a dataset of support questions and answers. This will be used to train the model.
  2. Preprocess the dataset by cleaning and normalizing the text, and splitting it into training and testing sets.
  3. Choose a model architecture and train it on the dataset using a machine learning library such as TensorFlow or PyTorch.
  4. Fine-tune the model by adjusting its hyperparameters and continuing to train it on the dataset until the desired level of accuracy is achieved.
  5. Test the model on a separate test set to evaluate its performance.
  6. Integrate the trained model into the website’s code, and use it to provide answers to support questions from users.

It is also important to monitor the performance of the model, and continuously improve the model by using the feedback from the customer and updating the dataset.

how should I label a dataset of support questions and answers?

Labeling a dataset of support questions and answers can be done in a few different ways, depending on the specific task and the desired level of granularity. Here are a few common techniques:

  1. Classification: Assign one or more predefined categories to each question, such as “billing,” “technical,” or “account management.” This approach is useful when the goal is to route incoming questions to the appropriate department or agent.
  2. Intent identification: Assign one or more predefined intents to each question, such as “cancel subscription,” “change password,” or “get refund.” This approach is useful when the goal is to understand the user’s overall goal or objective, rather than the specific topic of the question.
  3. Keyword tagging: Assign one or more predefined keywords or phrases to each question, such as “billing,” “password,” or “refund.” This approach is useful when the goal is to identify specific topics or entities mentioned in the question.
  4. Multi-labeling: Assign multiple labels to each question, using a combination of the above techniques. This approach can provide a more detailed understanding of the question and its intent.

The most important thing is to label your dataset in a way that makes sense for your specific use case and is useful for the model you plan to train. It’s also important to have a consistent and clear definition of the labels. If you’re working with a large dataset or a team, consider using tools like labelbox to manage the labelling process

how do i classify questions? can you give me an example?

Classifying questions involves assigning one or more predefined categories or labels to each question, based on its content and meaning. Here is an example:

  • Suppose you are building a model to classify customer support questions for a software company. You might define the following categories: “Billing,” “Technical,” “Account Management,” and “Sales.”
  • You have a question “How do I change my billing address?” This question can be classified into “Billing” category
  • Another question “Why is my software not working properly?” This question can be classified into “Technical” category

There are several techniques that can be used to classify questions, including:

  1. Supervised learning: Train a model on a labeled dataset of questions and their corresponding categories. The model can then be used to classify new questions based on their similarity to the examples in the training dataset.
  2. Rule-based systems: Define a set of rules or heuristics that map specific words, phrases, or patterns in the question to a particular category. For example, a rule that maps any question containing the word “billing” to the “Billing” category.
  3. Hybrid systems: Combine the above techniques, for example, a rule-based system that is fine-tuned using a supervised learning algorithm.

It’s important to note that the accuracy of the classification model depends on the quality of the labeled dataset and the chosen algorithm. So, it’s important to have a large and diverse labeled dataset and to evaluate the model performance on a separate test set.

Can you give me an example of how to add a category to a dataset?

Certainly, here’s an example of how you might add a classification label to a dataset of support questions and answers:

Let’s say you have a dataset of customer support questions and answers in a CSV file, with the following columns: “question,” “answer,” and “category”.

question,answer,category

  • How do I change my billing address?,You can change your billing address by logging into your account and going to the settings page,Billing
  • Why is my software not working properly?,It seems that there might be a compatibility issue with your current operating system. Please check the system requirements and update accordingly,Technical

In this example, each row represents one support question and answer, and the “category” column contains the classification label. The label is one of the predefined categories that you have chosen, such as “Billing”, “Technical”, “Account Management”, and “Sales”.

You can add more classification labels to the dataset by adding more rows to the CSV file and assigning a label to each question. It’s important to make sure that the labels are assigned consistently and accurately, in order to ensure the quality of the dataset.

It’s also a good practice to split the dataset into a training set and a test set, so you can use the training set to train the model and evaluate its performance on the test set.

You can use a library like Pandas to read and manipulate the CSV file and other libraries like scikit-learn to train and evaluate the model.

can you show me how to format the above example as a jsonl prompt dataset?

Sure! Here’s an example of how you might format the above dataset as a JSONL (JSON Lines) file for use with a language model like GPT-3:

Copy code

{"prompt":"How do I change my billing address?", "label": "Billing"}

{"prompt":"Why is my software not working properly?", "label":"Technical"}

In this example, each line is a JSON object representing one support question, and the “prompt” key contains the question text, the “label” key contains the classification label. The label is one of the predefined categories that you have chosen, such as “Billing”, “Technical”, “Account Management”, and “Sales”.

You can add more classification labels to the dataset by adding more lines to the JSONL file and assigning a label to each question. It’s important to make sure that the labels are assigned consistently and accurately, in order to ensure the quality of the dataset.

It’s also a good practice to split the dataset into a training set and a test set, so you can use the training set to train the model and evaluate its performance on the test set.

You can use a library like json or jsonlines to read and manipulate the JSONL file and other libraries like HuggingFace’s transformers to fine-tune the language model using your dataset.

can you include the completion in your above example?

Sure! Here’s an example of how you might format the above dataset as a JSONL (JSON Lines) file for use with a language model like GPT-3 that includes the completion as well:

Copy code

{"prompt":"How do I change my billing address? ", "completion": "You can change your billing address by logging into your account and going to the settings page", "label": "Billing"}

{"prompt":"Why is my software not working properly? ", "completion":"It seems that there might be a compatibility issue with your current operating system. Please check the system requirements and update accordingly", "label":"Technical"}

In this example, each line is a JSON object representing one support question and answer, and the “prompt” key contains the question text, the “completion” key contains the answer, and the “label” key contains the classification label. The label is one of the predefined categories that you have chosen, such as “Billing”, “Technical”, “Account Management”, and “Sales”.

You can add more classification labels to the dataset by adding more lines to the JSONL file and assigning a label to each question and completion. It’s important to make sure that the labels are assigned consistently and accurately, in order to ensure the quality of the dataset.

It’s also a good practice to split the dataset into a training set and a test set, so you can use the training set to fine-tune the model and evaluate its performance on the test set.

You can use a library like json or jsonlines to read and manipulate the JSONL file and other libraries like HuggingFace’s transformers to fine-tune the language model using your dataset.

Me:

After days of trying to hunt down this information, this answered my immediate burning questions. And, never before had I seen the “categorization” option! I know some of you are reading this and saying “Duh!” But, it just occurred to me today to ask ChatGPT3 how to build a mini-ChatGPT3 subset. Duh!

The answer or completion starts with the test “of course” with a space before it

When you send the question you are going to include the Agent: label at the end of your prompt (without the space) so the ai knows it needs to say something

Otherwise it will try to complete the question instead of giving an answer

Does that help?

1 Like

Yes, the format for fine-tuning is a JSONL file with a tuning entry per line in format:

{"prompt": "This is the question", "completion": "This is the reply you desire"}

You create a JSONL file of your required fine-tuning params and then you process that file and use it to fine-tune the model you wish to fine-tune.

There seems to be a different way to do this depending on who you talk to. I, too, am looking to fine-tune for customer service/support. When I chatted with GPT3 on the subject, this is what it came back with:

Sure! Here’s an example of how you might format the above dataset as a JSONL (JSON Lines) file for use with a language model like GPT-3 that includes the completion as well:

`{"prompt":"How do I change my billing address? ", "completion": "You can change your billing address by logging into your account and going to the settings page", "label": "Billing"}`

`{"prompt":"Why is my software not working properly? ", "completion":"It seems that there might be a compatibility issue with your current operating system. Please check the system requirements and update accordingly", "label":"Technical"}`

In this example, each line is a JSON object representing one support question and answer, and the “prompt” key contains the question text, the “completion” key contains the answer, and the “label” key contains the classification label. The label is one of the predefined categories that you have chosen, such as “Billing”, “Technical”, “Account Management”, and “Sales”.

So, I guess my question is, is the above also correct if I wish to include a classification with each prompt?

1 Like

In the same boat!
Is there a schema to get the complete picture?

I don’t know a lot, but I can tell you that following the standard fine-tuning instructions here, OpenAI API, the model rejected the 3 keys (prompt, completion, label) above. So I submitted the dataset with just the prompt and completion. Results were varied. Everyone keeps telling me that to get the best results (answers that match your dataset more closely) you have to use “embeddings”. Trying to learn what that is and how to do it now. See: Google Spreadsheet + GPT3

ChatGPT made up the idea of a label. That might be valid in some other API that uses JSONL

In the case of GPT you only use prompt and completion

Yes, this is correct. I only use “prompt” and “completion”…

… careful about ChatGPT, it hallucinates solutions. You must confirm via the API docs (and by testing of course) anything ChatGPT offers as a technical solution:

See also:

OpenAI: Fine-Tuning

@ffxsam did you get this working (credibly) using fine-tuning?
I’m experimenting with a similar thing - see if we can automate responses in our support chat.
The results so far using fine-tuning are not great. I’m wondering if using embeddings to provide the context might be a better way to go.

1 Like

Absolutely. That’s the power of these models, you can translate all text, images, video (24 images per sec), sound (spectrograms) into an embedding aka a vector aka a table of numbers.

With that now you can compare and relate all your data. GPT3 is a huge collection of embeddings and their relationships. Fine tuning it with your embeddings creates new relationships between these vectors, so now knows how to reply using your data.

I am curious about this reply. My understanding of embeddings is that you can vectorize your content residing in your database, and run computations against it with search vectors. Select OpenAI models can provide the embeddings via API calls to it using your content.

I do not understand how you can fine-tine a model with your embeddings? Does this mean you can upload your content vectors to an OpenAI model?

2 Likes

What you are trying to do is not efficient through fine tuning. Fine tuning won’t get you there. You won’t get the results you want and can end up being very costly as a CS Chatbot. Your best bet is using embedding, check out then OpenAI documentation. I know embeddings work really well with Aleph alphas text models. Gd luck

1 Like

Do I need to use tags inside prompts inside the JSONL file so that OpenAI takes into account the last message?

{"prompt":"Summary: <**summary of the interaction so far>**\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: **<response1>**\nCustomer: **<message2>**\nAgent: **<response2>**\nCustomer: **<message3>**\nAgent:", "completion":" <response3>\n"}

or I could just put actual text like the one @ffxsam made? I jus wanna make sure the chatbot will remember the last message

{“prompt”: "Customer: Hey, how can I do XYZ using the ABC feature?\n\n###\n\nAgent: Hey there!\n\nJust click the Widgets link in the left nav, then do BBB followed by CCC, then DDD.\n\n###\n\nCustomer: Great, thanks! xoxo\n\n###\n\nAgent: ", “completion”: “No problem, just give us a shout if you need anything else! END”}