Recently, I’ve been using the embedding function and have made some interesting discoveries that I would like to discuss with everyone. I’ve been using the text-embedding-ada-002 model.
One commonly cited example is “woman - man = queen - king = female,” as shown in the following image.
The circle is because all the points should be in length of 1, where should all on a circle if it is 2D.
After doing some experiments, below is the result. If there is a arrow, for example, Man → Woman, it means that Emb(Woman) - Emb(Man).
The relation is not that high comparing to my assumption.
Then, I came up with an idea: Is 1,536 dimensions too high and the amount of information contained in each word too small?
Then I did another experiment is that using descriptions by ChatGPT instead of words.
- description_man = ‘A male human being, typically distinguished by physical characteristics such as a deeper voice, facial hair, and greater height and muscle mass than females. Men have played significant roles in history and society, and have been involved in various fields such as science, politics, arts, and sports.’
- description_woman = ‘A female human being, typically distinguished by physical characteristics such as breasts, wider hips, and a higher-pitched voice than males. Women have also played significant roles in history and society, although they have faced challenges such as gender discrimination and inequality. Women have made contributions to various fields such as science, politics, arts, and sports.’
- description_king = ‘A male monarch who typically inherits his position by birthright and rules over a kingdom or an empire. Kings have played significant roles in history, and have been regarded as powerful figures with the ability to make important decisions that affect the lives of their subjects.’
- description_queen = ‘A female monarch who typically inherits her position by birthright, or who marries a king. Queens have also played significant roles in history, and have been regarded as powerful figures with the ability to influence the decisions of their monarchs. Queens have been involved in various fields such as politics, arts, and philanthropy.’
The similarity increase a lot!!!
My inference is that it is possible that with the Ada model, it has learned too much knowledge (compared to the previous Word2Vec model), so the word “Man” may not only express male gender, but also a nickname, a location, a song, etc. Therefore, the difference between “Man” and “Woman” is not just gender.
I’m curious to know what everyone thinks about this research.