How can I keep tokens with high probability only

Hi all,
I have just fine-tuned “DaVinci” for product taxonomy. How can I use log-probs, to keep only those tokens that have probability> THRES?

Thanks for the help…

You may have to do this post processing.

Your problem will be that tokens are not always entire words. So you need to check the token before and after your log-probs hits

The token dictionary is available online (You may need to test to find the correct one).

Thanks @raymonddavey
Do you have any link or resource, which I can use.?

Thanks in advance…

The only thing I can offer is this link for access to the dictionary

And then your need to turn on the setting that sends back the top log-probs

Once you get the reply, you will need to tokenize the response text you get back, and then look for the log-probs tokens within the result. When you find then, search back and forth until you find a space, a new line, or the start or end of the text.

This is the only way to build a list of the full word. Also the API is limited (from memory I think you can get 5 log-probs, but I would check that if I was you)

1 Like