hey guys
i’m working on a small project, where i ask gpt-3.5-turbo to classify texts (1000-4000 words each) by ~100 topics. each text can have more than one topic.
my code look like this:
def gpt_classification(file_content):
my_messages = [{"role": "assistant", "content": f"you are a classification
analysis assistant. I have a list of topics that interests me. given a text, I
want you to check if the text is talking about any of those topics Whether
directly or indirectly, in the spirit of things (but classif by the only topics i
will give you) and output the topics the texttalked about."},
{"role": "assistant", "content": f"here are the topics that I want you to
classify, seperated by ','. here the topics:{str_topics}. it's important that
all the classifications will be from the list i gave you exactlly." },
{"role": "user", "content": exp_file}, #example text1
{"role": "assistant", "content": exp_answer}, #example answer1
{"role": "user", "content": exp_file2}, #example text2
{"role": "assistant", "content": exp_answer2}, #example answer2
{"role": "user", "content": f"remember to classify by the list of topics i
gave you and not another topics. here the text: {file_content}"}
]
res = openai.ChatCompletion.create(model="gpt-4", messages = my_messages)
return res["choices"][0]["message"]["content"]
and that function goes on apply in df with all my texts.
the problem is gpt classify not only by that topics, but with many more. when i ask him directlly not to do it.
is there something in the prompt that i need to change to classify by only the topic list?
This is due to the length of the texts? i need to use gpt4 instead?
thx very much (: