Standard Dataset for Semantic Similarity

Hi, community.

I am using ADA 002 to compare generated text to standard/ideal text. It works well, but I need to justify using ADA 002 over BERT or another embedding technique. The easiest way to do this is to evaluate ADA’s performance against BERT’s, etc. on some standard set of texts.

Does a standard evaluation set exist for measuring embedding techniques’ performance on semantic similarity?

(I used translations of the Bible - e.g., NIV vs KJV, NKJV, BBE, etc. - but my thesis requires a more-standardized set for evaluation)

Thank you!

You can see a comparison/performance table of various embedding models here

It should be noted that different use cases find different models to be more performant, ada has a very good all around usage score and is an excellent option. It’s difficult to pick a “best”, it’s more “best for what?”