Classification with generative models

I am using generative models like ChatGPT, T5, Flan, Llama, etc for classification. I have three classes. So it is a 3-class classification problem. The class labels are: Not vivid, moderately vivid, highly vivid. The model predicts the class labels. But I need to get the probability of each class similar to BERT model i.e, If I fine-tuned a BERT model, It is easy to get the probability of each class. We need to add a SoftMax layer to the last year which returns the logits for each class. But the performance of BERT is not good for my scenario and using a generative model has a good performance. But I don’t know how to get the probabilities for each class using these generative models which output the probability distribution over the vocab, not over the classes.

This is something that can be done with OpenAI’s logprobs also. The best way is to use a completion model, and place the response where only a single token is judged.

logprobs are based on language probability and the instruction that guides the AI to producing a particular set of tokens. Also only a limited number are returned. However this logprob is a way of internally finding the confidence of the AI model rather than just asking it.

I’m not quite clear what you mean by “over the vocab”, but as you can see, you don’t get a score for all of each of 100000 BPE token possibilities. The weighting of output tokens is a product of all the prompt context plus the training of the model itself.


Thank you so much for the reply.

it seems log probability makes sense. are you using playground for the image you attached? based on the example here in “Completion API” section “” the logprob is NULL for completion method and I dont know how to get it.

It seems openAI has this logprob which can be used for classification but other generative models like Flan dont have this logprob and I am using it (through huggingFace API)because it is free and I dont know how to get the probability of each class. If you happen to know, let me know please. It would be much appreciated because i am trying to solve this problem for months. I asked this question here because I thought all the generative models are the same.

I show the OpenAI API playground, and the completions mode with gpt-3.5-turbo-instruct as the model and “show probabilities”, which uses the return value of logprobs. It is accessible and understood.


We just need a bit of code and invoking the API properly to do this programmatically, also converting logprob (logarithmic) to normal probability.

import math
from openai import OpenAI
client = OpenAI()

response = client.completions.create(
  prompt=("You are Santa Claus and are checking your naughty or nice list. "
  "You see that Timmy likes to step on cat's tails to hear them howl. Is Timmy:\n"
  "1: Naughty?\n2: Nice?\n\n"
  "(Santa Claus only outputs the number as his answer.)\n\nA: "),
log_probs = response.choices[0].logprobs.top_logprobs[0]
normal_probs = {key: math.exp(value) for key, value in log_probs.items()}

To then get the same response from the API:

{‘1’: 0.9958548274432318, ‘\n’: 0.001900562469577879, ’ ': 0.0010887979384974482, ‘2’: 0.0005898265779205285, ‘\n\n’: 0.00022387202194129092}

You see that I added a trailing space so that tokens starting with a space are discouraged and the AI will get right to a number, which are always their own token up to 999.

Timmy is scientifically naughty.

(I use high temperature so we get another nearly random token with a 6th logprob if you want to extract that also).

Just to explain what I mean by a probability over the vocab.

These generative models generate output token one by one based on probability over all the tokens(vocab) they are trained on (available for them). They usually perform a beam search to find the best probabilities for the next token generation so that’s why there is no 100000 BPE token probabilities or maybe they are very small (almost zero). So I am still thinking if your suggested solution is scientific