Gpt 3.5 Classified outside data labels

hey guys :slight_smile:

i’m working on a small project, where i ask gpt-3.5-turbo to classify texts (1000-4000 words each) by ~100 topics. each text can have more than one topic.

my code look like this:

def gpt_classification(file_content):

my_messages = [{"role": "assistant", "content": f"you are a classification 
                   analysis assistant. I have a list of topics that interests me. given a text, I 
                   want you to check if the text is talking about any of those topics Whether 
                   directly or indirectly, in the spirit of things (but classif by the only topics i 
                  will give you) and output the topics the texttalked about."},
               {"role": "assistant", "content": f"here are the topics that I want you to 
                   classify, seperated by ','.  here the topics:{str_topics}. it's important that 
                   all the classifications will be from the list i gave you exactlly." },
               {"role": "user", "content": exp_file}, #example text1
               {"role": "assistant", "content": exp_answer}, #example answer1
               {"role": "user", "content": exp_file2}, #example text2
               {"role": "assistant", "content": exp_answer2},  #example answer2
               {"role": "user", "content": f"remember to classify by the list of topics i 
                   gave you and not another topics. here the text: {file_content}"}
]

res = openai.ChatCompletion.create(model="gpt-4", messages = my_messages)
return res["choices"][0]["message"]["content"]

and that function goes on apply in df with all my texts.

the problem is gpt classify not only by that topics, but with many more. when i ask him directlly not to do it.

is there something in the prompt that i need to change to classify by only the topic list?

This is due to the length of the texts? i need to use gpt4 instead?

thx very much (:

Welcome to the forum Yonatanz,

Try changing the first role to system instead of assistant.
The assistant role is generally the reply you get from the API and the system role is the instructions you first give it.
The user role would be your questions or text.

Paul D

{“role”: “assistant”, “content”: f"here are the topics that I want you to classify, seperated by ‘,’. here the topics:{str_topics}. it’s important that all the classifications will be from the list i gave you exactlly." },

First - look at the role. The AI appears to be saying to the user “I want you to classify”. The user role tells the AI. Or if system programming of behavior, put it all in a system role.

Outside of a few spelling mistakes reducing comprehension, I would use programming techniques which the AI will understand.

system: from article, find topic. output format: “Topic: best_topic_from_list”
user: choose from the most applicable article topic. topic_list = [“aliens”, “pyramids”, “magic crystals”,…]
user: extract topic: {your_article}

This assistant role you can now toss completely, as it adds no value, and we replaced the others:

{“role”: “assistant”, “content”: f"you are a classification analysis assistant. I have a list of topics that interests me. given a text, I want you to check if the text is talking about any of those topics Whether directly or indirectly, in the spirit of things (but classif by the only topics i will give you) and output the topics the texttalked about."}

If the latest gpt-3.5-turbo fails you badly, try -0301