How to get multiple labels for text with confidence in tuned model?

Hi, I’m tuning a model for text classification , text-label , how to force model to return 4 labels with confidence score . I can do it in regular chat - “show 4 most relevant labels for provided text” , but trained model matches text to one label only .

1 Like

For my binary classifier, I get something like this returned:

'top_logprobs': [{' 1': -1.5058299, ' 0': -0.25085944}]

But I have to send this in the request:

"logprobs": 2,

So you would send 4 instead for logprobs in the request.

But you are using old classification endpoint ? I’m on chat model.

Yes old model. The newer ones of babbage-002 and davinci-002 have this feature:

'top_logprobs': [{'0': 0.0, '1': -19.283203}]

You just don’t need the pre-pending space (’ 0’ vs ‘0’)

But chat is expected to “catch up” on the API and receive a logprobs update in the future:

I suspect that the use case and return expected here is more like:

{"classification_labels":
  {"happy": 0.7",
  "technology": 0.8,
  "artificial intelligence": 0.5,
  "lingustics": 0.25,
}}

We’ll have to see some training file examples to see what’s the expected output for a particular input, and why it might not fine-tune the model well.

This may be a case where training a base davinci-002 will be better than trying to fight chat pre-training behavior, using completion style input.

Thanks! , i’ll try with old one in that case , i’m using cohere for that but was not happy with the results.

Start using the ‘-002’ versions, as the older base models are deprecated and will be removed early next year.

I would probably avoid having the AI hallucinate the values in some prompted output, as they will likely be non-sensical anyway.

1 Like

Logits for multiple outputs might not be very good if you have multiple returns.

Let’s say I have apples, bananas, monkeys, and trees in my image:

Item one (AI: jeez, I’ve got four things, I don’t know which is first, 25% prob of all)

Item four (AI: there’s only one choice remaining, that’s a 100%!)

I’m using such training data in davinci 3.5 , and want to display multiple relevant labels

but the query is : show 4 most relevant labels for provided text

Just to be clear, you would train either Babbage-002 or Davinci-002 and just feed the text as the prompt and the label as the response in the JSONL file.

Then you create the fine tuned model.

After it’s created, you send the raw chunk of text with logprobs: 4 in the request.

Here, when I send this to my binary Babbage-002, I get:

'top_logprobs': [{'0': 0.0, '1': -19.283203, ',': -20.699707, '5': -20.947754}]

It wasn’t trained on ‘,’ or ‘5’, those are hallucinations since this is a binary classifier. But you can see the ‘0’ and ‘1’ are the first two, and in your case, expect to see the full top 4 answers of, say, ‘30’, ‘29’, ‘28’, ‘31’, etc instead since you trained on these output tokens.

Just take e^(logprob) to get the probability where e = 2.718281828459…

1 Like

Thanks a lot, I will try davinci002 !

1 Like

Okay…davinci-002, make sense of the top four:

image

Or maybe leaving the option of a quote was too ambiguous, so I supply it…

image

Not going to be able to pull the top four from that. Best to stay in the language domain.

1 Like

It will if you create the fine-tune since the model is constrained to the trained labels on the output.

Your example is from the Playground, without the heavy influence of the fine-tune labels supplied by the user.

It depends on how open-ended the input and enums of output are. It could be a sorter to put things in 4 of 20 blog categories, but we really need to, like the AI, know what exactly is wanted here.

The user is showing the output is a set of integers above.

See chart above.

One example from the chart is

“mathematics” → “30”

So it can’t be open ended from the model.

But right above a list of text inputs and some numbers:

The desired output from accepting the word “mathematics” is the best several outputs like “30”, “25”, “28”, “40”?

Yes , I have labels 30 - mathematics , 31 - mathematics research , … 33 - mathematics history

1 Like

Yes, it looks like @tom111 is mapping {Words} → {Integers} for the purpose of classification.

This is usually the best approach (simple 1-token outputs) and when you run, you set the temp to 0, and the request only 1-token as the response.

It reduces noise in the probability space.

Just be sure that ‘30’ or whatever is a single token. Otherwise you may need to re-map your labels to 1-token specific labels to reduce the noise in the categorization space.

From experience, I know that ‘0’ and ‘1’ are each 1-token :sweat_smile:

1 Like

As you said propably chat model cannot do it , even if i ask to return four labels in format id:id:id