Use OpenAPI for supervised classification task

hi OpenAI Community,

Im currently exploring the capability of OpenAI in regard to a supervised classification task I try to tackle.
In a nutshell I have

  1. a training set of 13k+ example of keyword:category mappings (jsonlines with {"prompt":"kw","completion":"category"} format
  2. a list of categories
  3. a list of keywords

Id like the model to map 3) to the correct/most likely category in 2).

I tried the /completion endpoint and provided 1) as fine-tune training set but clearly seem to be missing something as the result is not supervised in the sense that it maps my keywords to some random categories

Was wondering if you could advise whether or what endpoint or approach would be best for this endeavour?

code used after creating a fine-tune model:

api_results = []

for i in tqdm(kw):

  r = openai.Completion.create(prompt=f"take the following keyword: {i} and perform a supervised classification by mapping it to the correct category.",
                               model="curie:ft-personal-2023-03-26-11-36-10", 
                               max_tokens=7,
                               temperature=0,
                               top_p=1.0,
                               frequency_penalty=0.0
                               )
  api_results.append([i,r["choices"][0]["text"]])

Many Thanks
M

Try a single token output like ' 0', ' 1', etc. And set max_tokens = 1. Then cleanup and map the predicted tokens to your category.

3 Likes

Hi @obkirchermirko and welcome to the community!!

I guess I’d work a bit in the prompt first. Have you tried to declare the categories on it? Also, I believe that explaining a bit more about the task would bring more context to the output.

I don’t know yet your goals with this exercise, but I’d try to tweak the prompt to something like:

Take the following keyword: {kw} and classify it into {classification1}, {classification2} or {classification3} regarding {task}. Present a string with only the classification.

It’d answer something like:
image

I’d run it and evaluate the results. Check what are the pros and cons of the classifications provided before thinking about other steps. I have a strong feeling that the future steps are probably also regarding the prompt design.

About fine-tuning model: I strongly recommend you to watch this video.

I hope it helps you somehow :slight_smile:

Thanks very much @dmirandaalves! Resource you shared makes it pretty clear that my use case is not made for fine tuning. that per se in a learning.

as far as using the prompt more wisely. Issue I have is that the amount of categories I have is rather large (500+). Im not sure if packing all of them in the prompt is something the model can digest, but Ill have a try.

thanks

Hey @obkirchermirko, I recommend you taking a look at this specific part of the documentation. It seems this example can help you with that as well!