OpenAI just announced the Text and Code Embeddings endpoint:
We are introducing embeddings, a new endpoint in the OpenAI API that makes it easy to perform natural language and code tasks like semantic search, clustering, topic modeling, and classification. Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. Our embeddings outperform top models in 3 standard benchmarks, including a 20% relative improvement in code search.
Thank you. Arvind took the high road - nice. Nils claims were just too silly. Not that GPT-3 is perfect or the best - I certainly donāt know - but I get annoyed when I see things like ā1 million times more expensive.ā Thatās why I donāt make time for twitter.
@lmccallum his criticisms are actually quite valid and well grounded, especially wrt cost trade-offs. The nuance here is that the standard benchmarks he mentions may not correlate well with real-world dataset performance - which is what openai embeddings seem to be optimized for. Regardless, every practitioner should have their own in-house benchmarks for making these judgments.