I’m trying to rate how strongly strings of text relate to the Big 5 Personality Factors by generating embeddings using text-embedding-3-large for a text string and for a description of each Big 5 factor, then calculating the cosine similarity between each text string and factor description. (So not trying to classify the text string into 1 category, but measure the extent to which it relates to each factor).
However, I’m finding that if I vary the description of the definition for each factor (e.g. length, detail, keywords) I’m getting quite different results each time - and in many cases the cosine similarity is fairly even across the big 5 factors without much differentiation.
So, my question is are there any best practices around how to write descriptions for categories / factors that will generate the best results?
Embeddings is not a problem solver or a topic-finder. It is a similarity mechanism, returning the state of an AI after an input. Algorithmic comparison of the resulting vectors of two embeddings can score how correlated two inputs are, even down to the formatting or tone, along with other things only an AI knows.
A internet forum posting that is instigating and targeting is not going to have text that matches well with an overall description of stalking or brigading behavior, that would score the person’s toxic influence on that forum, for example.
Therefore, you will need to do your own computations, where what you have in a vector database containing textual examples that are similar to inputs you will send, and which have been manually rated strongly in one or several of the metrics.
Then after having a ranked return from a text against the embeddings and their labeling, you can proceed to an algorithm that determines the strength in each factor.
Thanks for the reply! Ideally I would train a custom model with textual examples, but I was hoping to use zero-shot classification (as this seems to work fairly well), but rather than assign the text string to the category with the highest cosine similarity score, I’d use the magnitude of the score to reflect how strongly the text related to each personality factor.
“Ask an AI” is something less algorithmic in nature. You are asking essentially "based on this input, tokens are ranked by certainty, with hopefully ones that are numbers being the most prominent out of the 200k, and that they relate to a score value.
In a structured output, but one where you are still given logprobs because you just told the AI how to output, you can look at logprobs, which is a ranking of those top tokens. If at the “happy” position there is “8”: 39%, “6”: 21%, you can do some interpolation beyond what the AI and its random sampler may output.
If you do not have extensive training data, this may be the path for you, at a magnitude greater expense than embeddings, without any reusable AI data. If you do, you can explore both fine-tune on performing that task with accurate scoring, and also see how embeddings performs. Even a synthesis of the two AI results might be considered.
Good luck, as the implementation is the secret sauce of many other products.