Yeah, I’m confused by this. Using the current model: “text-similarity-ada-001”, the similarity numbers are quite different than above.
Then I run this, I get a very different result.
Check similarities:
irb(main):004:0> params={string1:a,string2:b, method:'cosine'}
=>
{:string1=>"The cat sat on the mat",
...
irb(main):005:0> Embeddings.test_strings(params)
=>
{:string1=>"The cat sat on the mat",
:string2=>"The number 42 is the answer to the ultimate question of life, the universe, and everything",
:method=>"cosine",
:output=>0.6925531623476415}
irb(main):006:0> params={string1:a,string2:b, method:'dot'}
=>
{:string1=>"The cat sat on the mat",
...
irb(main):007:0> Embeddings.test_strings(params)
=>
{:string1=>"The cat sat on the mat",
:string2=>"The number 42 is the answer to the ultimate question of life, the universe, and everything",
:method=>"dot",
:output=>0.6925531596482191}
Check distances:
irb(main):008:0> params={string1:a,string2:b, method:'manhattan'}
=>
{:string1=>"The cat sat on the mat",
...
irb(main):009:0> Embeddings.test_strings(params)
=>
{:string1=>"The cat sat on the mat",
:string2=>"The number 42 is the answer to the ultimate question of life, the universe, and everything",
:method=>"manhattan",
:output=>19.688237328809972}
irb(main):010:0> params={string1:a,string2:b, method:'euclidean'}
=>
{:string1=>"The cat sat on the mat",
...
irb(main):011:0> Embeddings.test_strings(params)
=>
{:string1=>"The cat sat on the mat",
:string2=>"The number 42 is the answer to the ultimate question of life, the universe, and everything",
:method=>"euclidean",
:output=>0.7841515624597036}
Since the dot product and the cosine similarity methods are the same (within a rounding error), these numbers match / confirm by different methods. Also, the euclidian distance is as expected relative to the dot product (and of course the cosine similarity, for the unit vector).
