Multi class Text classification with GPT 3.5

Hi all,
I have a small annotated data set and i want to do Multi class text classification with GPT4. Please route me towards any guide or give me a over all direction so i can do it.
I have searched this forum but couldnot find a relevancy towards Multi class.
I also have tried Curie to build a model with my data set but i thing some thing is missing so my accuracy was 0 percent.
here is the code:

import openai

Initialize the OpenAI API with your key

openai.api_key = “API key”

Define the fine-tuned model ID

fine_tuned_model_id = “curie:ft-my model”

Take input for unseen data

unseen_data = input("Enter the unseen data: ")

Classify using the fine-tuned model

response = openai.Completion.create(
model=fine_tuned_model_id,
prompt=unseen_data + “\n\n###\n\n”,
max_tokens=1
)

Print the classification result

print(“Classification Result:”, response.choices[0].text.strip())

Evaluation using a predefined validation dataset

validation_data = [
{“input”: “I’ve struggled for a long while. Second grade I had anger and drug issues that made me an outsider. It just got worse from there. Now I’m 20.”, “expected_output”: “Drug and Alcohol”},
{“input”: “i remember being a small child left alone in a giant house and feeling insane fear of monsters, ghosts and zombies with no one to comfort me. i would cry and scream yet no one still would come for me.”, “expected_output”: “Early Life”},
# … add more validation data as needed
]

correct_predictions = 0

for data in validation_data:
response = openai.Completion.create(
model=fine_tuned_model_id,
prompt=data[“input”] + “\n\n###\n\n”,
max_tokens=1
)
predicted_output = response.choices[0].text.strip()
if predicted_output == data[“expected_output”]:
correct_predictions += 1

accuracy = correct_predictions / len(validation_data) * 100
print(f"Model Accuracy: {accuracy:.2f}%")

  1. Please describe the details of your fine-tuned model.
  2. You might be interested in fine-tuning the new gpt-3.5-turbo base model.
1 Like

Thank you for your responce.
Actually I am more interested in GPT4 but i thought i should start with Gpt 3. following the way i did fine tuning:

import openai

Initialize the OpenAI API with your key

openai.api_key = “API key”

print(“Starting the fine-tuning process…”)

Fine-tune the model with adjusted hyperparameters

response = openai.FineTune.create(
model=“curie”,
training_file=“file-egyT7MiegJtOxUWumpRdZ1VU”, # Your training file ID
n_epochs=6,
batch_size=2,
learning_rate_multiplier=0.05,
prompt_loss_weight=0.01
)

Print the fine-tuning details

print(“\nFine-tuning ID:”, response[“id”])
print(“Model being used:”, response[“model”])
print(“Status of fine-tuning:”, response[“status”])

Check if there are any events/messages from the fine-tuning process

if “events” in response:
print(“\nEvents during fine-tuning:”)
for event in response[“events”]:
print(f" - {event[‘message’]} (Timestamp: {event[‘created_at’]})“)
else:
print(”\nNo events reported during fine-tuning.")

  • How many data points did you have in your training set? Test set? Validation set?
  • How many classes do you have in your data?
  • What is the distribution of the classes? Unimodal? Multimodal? More-or-less uniform?

Also, it’s not yet possible to fine-tune gpt-4 but fine-tuning gpt-3.5-turbowill be better and cheaper than other fine-tunings.

1 Like
  • How many data points did you have in your training set? Test set? Validation set?

I didnt do any thing in the code , but i am looking 80,10,10. so like. 560,120 and 120 for 800 rows

  • How many classes do you have in your data?
    4 classes

  • What is the distribution of the classes? Unimodal? Multimodal? More-or-less uniform?
    200 each