Data format to train assistant based on user-support dialogs

apris · April 1, 2024, 11:02am

I want to train my assistant with data from previous interactions (dialogs) of my clients and support. In which format is it better to provide those dialogs so that assistant could clearly analyse them and better learn?

jorgeintegrait · April 1, 2024, 11:43am

Depends on the model, but if by training you mean fine-tuning, and by assistant you mean a gpt model, the format is defined in the OpenAI Docs:

https://platform.openai.com/docs/guides/fine-tuning

Happy building!

apris · April 1, 2024, 3:25pm

Is fine-tuning of a “gpt model” smth different from creating and finetuning assistant created withing openAI assistants API (which I was originally referencing)?

And what is in my case more appropriate then? My final goal is to connect the assistant to smth like livechat (with my mediator which will take message from user, send the to openai (whatever assistants api or completions over fine tuned model or whatever is more suitable), and return responce back to user.

tero · April 1, 2024, 6:19pm

Yes. Fine tuning a model is very different from using an assistant.

In the assistants GUI you don’t really have a way to “train” the assistant. You can fit only a limited number of characters into the instructions. Maybe 10 example conversations max on top of your instructions. The format is not as strict as in fine tuning. E.g.
Customer: “can you help me”
Assistant: “sure thing”
Works fine.

There is an option to enable file retrieval but if you want very specific type of answers from your assistant I don’t think the file retrieval is going to help much. At least I haven’t been able to use the files for feeding example conversations and have the assistant actually use them.

This is why fine tuning might be the way to go if you want to teach something specific to your use case and have the model respond in a predictable manner.

apris · April 2, 2024, 10:29am

as far as I understand , there is a third approach based on embeddings, which is suggested for the very similar to mine case: Customer support assistant with automated response - #6 by matcha72

How do you think - will embeddings better work for my case then assistants API?

tero · April 2, 2024, 11:25am

Someone more experienced can correct me if I’m wrong, but from what I understand:

Fine tuning works well if you want to teach the model how to respond, including, style, tone, format.
Fine tuning doesn’t work well if you need the model to know specific FAQ type details about your business like cost of product X is 249Usd. The knowledge file retrieval is meant to be used for this type of details.

What type of responses do you need the model to give?

apris · April 2, 2024, 11:26am

very good point, and it is correct that my question is more about FAQ answers

N2U · April 2, 2024, 11:39am

@matcha72 we still need an answer over here, don’t go posting in other part’s of the forum before you’ve explained yourself.

Topic		Replies	Views
What is it about Assistants and Fine-tuning? Documentation assistants	14	7111	November 19, 2024
Training GPT assistant using JSON API gpt-4 , chatgpt , fine-tuning , api , assistants-api	4	818	September 13, 2024
GPT-3.5-turbo fine-tuning plus document retrieval Documentation fine-tuning	7	3692	November 12, 2023
Correct format for dataset in chat model fine-tuning API fine-tuning , documentation	4	1953	January 9, 2024
Fine tuning data format for chatting history API chatgpt	2	378	March 20, 2024

Data format to train assistant based on user-support dialogs

Related topics