How to build a conversation classifier with the new Fine Tuning Job API and Gpt3.5?

simonbirrell · August 30, 2023, 3:34pm

Now that the /classification endpoint is deprecated, my understanding is that the recommended way to build a classifier is using gpt-3.5-turbo and the Fine Tuning Job API. I also understand that the recommendation is to do as much as possible with prompt engineering, before submitting the ground truth examples for fine tuning. Please correct me if I’ve misunderstood!

If that’s is the case, I assume the prompt should contain

the data to be classified
instructions on how to determine the label and what labels can be applied
instructions on the output format

If I’ve understood all that correctly, it would be very useful to see some examples!

Finally, in my use case, I’m actually doing analysis on a conversation between two humans (something like sentiment analysis). For this purpose, should I include the conversation as text within the prompt, or as part of the structured API submission (that starts with “system: “)?

Thanks!

Simon

asands · August 30, 2023, 4:11pm

I am also wondering the same thing. I’ve been using a fine-tuned Ada model for binary classification and can find little in the way of guides or explanations on how to do this with babbage-002 or gpt-3.5-turbo.

Am looking to functionally replace this with the new models:

fine_tuned_model = openai.FineTuningJob.create(
    training_file=training_file['id'],
    validation_file=validation_file['id'],
    model="ada",
    classification_n_classes=2,
    compute_classification_metrics=True,
    classification_positive_class=' 1 \n'
)

simonbirrell · September 7, 2023, 10:27am

No reply from OpenAI - it feels like classification has effectively been removed from the API.

Foxalabs · September 7, 2023, 10:37am

There is usually no need to train GPT-3.5 for classification, you simply prompt it with “Given these classes,### {comma_delimited_list_of_classes}###, which of them is most suitable for classification of this text ###{text_to_be_classified}###”

If you were to have text that is not well known by the model, industrial terminology say, they you could build up fine tuning data set with examples of each classification and fine tune on that, or embed your examples into a vector db and retrieve semantically similar context for a given text you wish to be classified.

_j · September 7, 2023, 5:59pm

Although not announced and specifically declared, I expect other various models will also be going away with the removal of gpt-3 completion engines and their endpoint January 2024.

A good format for using gpt-3.5-turbo is to program the AI in a system message, and then send the data to be operated on in the user role. Some instruction should also be in the user role, so the AI doesn’t interpret the text as a question it should answer.

messages=[
{
“role”: “system”,
“content”: “”“Classify web forum conversational exchanges between users.
Goal: Identify final replies that are confrontational, rude, or antagonistic.
Site: The Sewing Bee crafts forum.
Output: JSON enum, 1 line. key=‘classification’: values=‘acceptable, confrontational’”“”,
},
{
“role”: “user”,
“content”: “Forum posts: [[[Simon: I like macrame\nPhil:That’s stupid 1970s garbage.]]]”,
}
]

simonbirrell · September 7, 2023, 8:34pm

That’s excellent advice, thank you!

sergeliatko · September 7, 2023, 10:46pm

Hey Simon,

And what exactly you’re trying to analyze within the conversation itself?

Subjects?
Human attitude to certain subjects?
Human attitude to opponent?
Human state?
…

The answer to this question will define the approach to take on how to analyze/classify conversation.

Topic		Replies	Views
Issues with Fine-Tuned Babbage-002 Model Returning Incorrect Completions Prompting gpt-4 , chatgpt	13	1902	December 21, 2023
Classify whether a question can be answered from the provided data API	4	2304	December 20, 2023
Using the new fine-tunes endpoint for binary classification API fine-tuning , python	10	2323	January 11, 2024
Gpt 3.5 Classified outside data labels Prompting gpt-35-turbo , api , prompt	2	1286	September 13, 2023
Fine-tuning on conversations API	6	3306	December 14, 2023

How to build a conversation classifier with the new Fine Tuning Job API and Gpt3.5?

Related topics