I am using ADA 002 to compare generated text to standard/ideal text. It works well, but I need to justify using ADA 002 over BERT or another embedding technique. The easiest way to do this is to evaluate ADA’s performance against BERT’s, etc. on some standard set of texts.
Does a standard evaluation set exist for measuring embedding techniques’ performance on semantic similarity?
(I used translations of the Bible - e.g., NIV vs KJV, NKJV, BBE, etc. - but my thesis requires a more-standardized set for evaluation)