Finetuning gpt3.5 with a conversation

Hi Im trying to fine-tune the gpt3.5 model and my dataset is a transcript from a call, so there is a lot of back and fourth how would I structure this in a way so my fine-tuned model should be able to generate a script based on these transcripts.

1 Like

Hi and welcome to the developer forum!

Can you explain a little more regarding what you want the fine tuned model to do?

When you say script, do you mean like a movie script? or a code script?

Hi thank you! Yeah I want it to generate a sales script based on all the success full transcripts I would pass into it. The script is a conversation no code all words.

At first thought, this is a super complex task, you’re wanting the AI to basically generate an idealised script optimising for the best interaction at any given point in the conversation.

The thing I would liken this to is a Sales “Chess” computer, where it is trying to find the best move at any stage of the game, but based on your internal conversation training data… This does not immediately jump out at me as being a large language model optimised task.

I’m not saying that this is not not totally solvable with an LLM, but it feels like a large project, 1000+ hours if you were to ask me for an initial guestimate of time for a minimum viable R&D tool.

I think you would have to combine some human feedback system in here at some point, so you’d need to score the transcripts of conversations to assess the success of it so far, perhaps starting with conversations that achieved an ideal closure and then using those, and perhaps putting them in a vector database for a similarity search… not super simple for sure.

Lets see if anyone else perhaps has a use case more similar or may have worked on systems like this.

Okay no worries. At the moment Im just doing some discovery my other option was to turn these scripts into embeddings with llama index make a suitable prompt for a LLM and ask to generate a script for a sales call. I feel like I wasn’t clear enough with what I mean, I dont necessarily want it to generate a script for a person to follow word for word more so a general guide on certain things to say that has worked for pervious people etc let the LLM be able to spot patterns and take note on the type of language the sales people have used and to take that into account when generating a script/general guide. Does that make more sense or do you want me to elaborate more on certain parts

It would help if you could describe a typical end users interaction flow with the application, i.e. name a goal and then the steps the user would do and the ideal responses an app would make to the users requests.

A goal would be to generate a script based on the style and format of scripts that fine-tuned the model. The user would input. Im trying to sell X to client Y can you generate me a script for this. The LLM would then generate a script which follows the general pattern of the transcripts which hopefully should have patterns in them which have lead to sales. Ideally the script would contain info relevant to the product and the language should be in a professional way.

I feel this may be a use case more directed towards embeddings then fine-tuning. Agree?

Yes, as the open ended nature of selling X to Y is so vastly wide, I doubt you can train on it, i.e. selling centrifuges to a laboratory head is wildly different to selling ice cream to a retail store owner.

No worries. Thank you for your help and advice! Enjoy the rest of your day.

1 Like

Hi Hoggarth. I skimmed the conversation and it’s a very interesting (and worthwhile) challenge that you present. I’d encourage you not to give up, as I think it’s solvable. What you likely need to do is to add an intermediate step: take all the transcripts that you have, and use the model to generate high level approaches for each of them (playbook strategies, or scripts/templates that serve as shorthand descriptions of the approaches being used), and perhaps also supplement it with contextual information about when and why these particular approaches tend to work. Then, you can take your sales situation and context and match it (maybe with vectors) to the most relevant “playbook” (or script), which in turn will have an association with relevant transcripts. Then you can fine tune a model (or multiple) model based on whatever parts of the system you want to make economical (for example a fine tuned model that asks the user for the context/situation of the sale, and then provides them with a playbook/script). Or, maybe you want to focus on a fine tuned model that can generate fictional scripts relevant to your playbook/scripts to help illustrate how to do it (but trained on your actual transcripts).

Sounds like an interesting project in any case. Let me know if you need any further help with it as I’m keen to explore fine tuning with the new 3.5 model. Previously I fine tuning helpful, but it only with a HUGE number of examples (40,000+ Q&A sets is where I started to see big improvements). But I’m guessing that the threshold for 3.5 will be far far lower since it’s essentially also a pre-trained model rather than a base one.

I’m also interested in this problem. Generally, I’m interested in analyzing “directed” conversations such as sales and customer support transcripts to understand the arc of successful and unsuccessful conversations. I’m wondering if there are identifiable moments within conversations that can be used as queues for guiding the conversation in the right direction. @Alan any thoughts on an approach and/or toolset for this?

@adobelis Do you have a taxonomy or playbook you already use for sales conversations already? If so, you can probably use that as your starting point and classify your transcripts according to them. Then you get good visibility into seeing how well your taxonomy/playbook is working already as a baseline.

Then you can go ahead and look at other ways of classifying your transcripts, and see what kinds of conversation arcs lead to success. Probably you already have good intuitions on ways they might be classified, but now that you can automate the process you can try them all out (or even some that the model suggests) and see what lends to success and what doesn’t.

For doing this, I’d suggest (1) embedding your transcripts into vectors, and that lets you do all kind of quantitative analysis on them, as well as put numbers to all kinds of qualitative aspects of the conversation: tone, subject matter, balances of questions vs answers, or even nearest adjectives that describe a conversation. Or for that matter, even taking an average of successful vs unsuccessful conversations and see what factors most set them apart. (2) Using the language models to comb through the conversations and create additional material you can analyze about them… tagging, ratings, classifications of sales strategies, whatever data you’d like. And then running your analytics on that too.

The great part is that once you have a handle on some of the key differences you can analyze the conversations in real time as they occur and provide real time metrics, or even advice and suggestions live during the course of a call.

Good luck, happy to chat if you ever want to brainstorm on the topic.

I’ve developed (with the help of a programmer) a prototype for a application leveraging gpt 3,5 and elevenlabs websocket API to 1. Train newly hired in sales, and 2. use the data from nr 1 to innovate in the field of voice to voice UI with ai. The best sales script is, as i see it, contextual. That being said i find it an interesting issue.

What we are looking into is the possibility of placing the customer into a archetype using a streaming stt model and a fine tuned model trained to decipher mbti-personality types, and using the given type in a prompt together with a corresponding “angle of attack” and current convo transcript in order to eventually be able to provide a real time situational approach as first a tool then a stand alone sales ai cold caller. We believe the nuance of sales calls is affected to a large degree by the cultural context, with values and customs varying to an extent that sales scripts doesn’t really work across cultures.

For your idea, you could use the sales scripts and asses them according to a framework and finetune gpt on that dataset, with conversation and assessment as well as format for assesing in the systemprompt. When gpt reliably is able to asses sales conversations using a framework sufficiently deterministic to find flaws in convos, i would like to think crm data and something akin to a business plan with sales offer explained would be sufficient to create a really ell put together script.

I am by no means sure. It’s a great topic of discussion.