Use OpenAPI for supervised classification task

obkirchermirko · March 26, 2023, 12:04pm

hi OpenAI Community,

Im currently exploring the capability of OpenAI in regard to a supervised classification task I try to tackle.
In a nutshell I have

a training set of 13k+ example of keyword:category mappings (jsonlines with {"prompt":"kw","completion":"category"} format
a list of categories
a list of keywords

Id like the model to map 3) to the correct/most likely category in 2).

I tried the /completion endpoint and provided 1) as fine-tune training set but clearly seem to be missing something as the result is not supervised in the sense that it maps my keywords to some random categories

Was wondering if you could advise whether or what endpoint or approach would be best for this endeavour?

code used after creating a fine-tune model:

api_results = []

for i in tqdm(kw):

  r = openai.Completion.create(prompt=f"take the following keyword: {i} and perform a supervised classification by mapping it to the correct category.",
                               model="curie:ft-personal-2023-03-26-11-36-10", 
                               max_tokens=7,
                               temperature=0,
                               top_p=1.0,
                               frequency_penalty=0.0
                               )
  api_results.append([i,r["choices"][0]["text"]])

Many Thanks
M

curt.kennedy · March 26, 2023, 3:12pm

Try a single token output like ' 0', ' 1', etc. And set max_tokens = 1. Then cleanup and map the predicted tokens to your category.

dmirandaalves · March 26, 2023, 3:59pm

Hi @obkirchermirko and welcome to the community!!

I guess I’d work a bit in the prompt first. Have you tried to declare the categories on it? Also, I believe that explaining a bit more about the task would bring more context to the output.

I don’t know yet your goals with this exercise, but I’d try to tweak the prompt to something like:

Take the following keyword: {kw} and classify it into {classification1}, {classification2} or {classification3} regarding {task}. Present a string with only the classification.

It’d answer something like:

I’d run it and evaluate the results. Check what are the pros and cons of the classifications provided before thinking about other steps. I have a strong feeling that the future steps are probably also regarding the prompt design.

About fine-tuning model: I strongly recommend you to watch this video.

I hope it helps you somehow

obkirchermirko · March 26, 2023, 6:08pm

Thanks very much @dmirandaalves! Resource you shared makes it pretty clear that my use case is not made for fine tuning. that per se in a learning.

as far as using the prompt more wisely. Issue I have is that the amount of categories I have is rather large (500+). Im not sure if packing all of them in the prompt is something the model can digest, but Ill have a try.

thanks

dmirandaalves · March 26, 2023, 7:31pm

Hey @obkirchermirko, I recommend you taking a look at this specific part of the documentation. It seems this example can help you with that as well!

Topic		Replies	Views
Help with fine-tuning for text categorization API	4	1293	December 16, 2023
Undesired categories in multiclass classification with gpt-4o-mini API api	3	482	November 10, 2024
Advice needed for JEL code prediction fine-tuning task Prompting fine-tuning	2	707	May 9, 2023
Looking for help with prompt optimization! Prompting	12	1139	May 10, 2022
How Can I Use the OpenAI API to Categorize Large Amounts of Text Data? API classification	3	5557	May 23, 2023

Use OpenAPI for supervised classification task

Related topics