Last crude test for now, loops thought “methods” and a comparison string (to dog). output is the similarity “score” for the method and strings. Added a little sleep so not to be “rate-limited” out of the loop
irb(main):075:0> methods
=> ["dot", "cosine", "euclidean", "manhattan"]
irb(main):063:0> compare=["cat", "asteroid", "rock fish", "submarine", "gemstone","dog food","chatgpt"]
=> ["cat", "asteroid", "rock fish", "submarine", "gemstone", "dog food", "chatgpt"]
irb(main):069:1* compare.each do |phrase|
irb(main):070:2* methods.each do |method|
irb(main):071:2* Embeddings.test_strings({string1:"dog",string2:phrase,method:method}); sleep 5
irb(main):072:1* end
irb(main):073:0> end
String1=dog, String2=cat, Method=dot, Output=0.9164348051073471
String1=dog, String2=cat, Method=cosine, Output=0.9164348102929122
String1=dog, String2=cat, Method=euclidean, Output=0.40881582463070715
String1=dog, String2=cat, Method=manhattan, Output=10.538499585644807
String1=dog, String2=asteroid, Method=dot, Output=0.8246138517753474
String1=dog, String2=asteroid, Method=cosine, Output=0.8246138832545461
String1=dog, String2=asteroid, Method=euclidean, Output=0.592260263820192
String1=dog, String2=asteroid, Method=manhattan, Output=14.8466230262702
String1=dog, String2=rock fish, Method=dot, Output=0.8351781225977196
String1=dog, String2=rock fish, Method=cosine, Output=0.8351781121091175
String1=dog, String2=rock fish, Method=euclidean, Output=0.5741461311561782
String1=dog, String2=rock fish, Method=manhattan, Output=14.63262434113021
String1=dog, String2=submarine, Method=dot, Output=0.8356327264031151
String1=dog, String2=submarine, Method=cosine, Output=0.8356327585001753
String1=dog, String2=submarine, Method=euclidean, Output=0.5733537044205766
String1=dog, String2=submarine, Method=manhattan, Output=14.554428136017805
String1=dog, String2=gemstone, Method=dot, Output=0.8032192820692583
String1=dog, String2=gemstone, Method=cosine, Output=0.803219292286573
String1=dog, String2=gemstone, Method=euclidean, Output=0.6273447301289572
String1=dog, String2=gemstone, Method=manhattan, Output=15.739283096362826
String1=dog, String2=dog food, Method=dot, Output=0.9305799323955518
String1=dog, String2=dog food, Method=cosine, Output=0.9305799640298726
String1=dog, String2=dog food, Method=euclidean, Output=0.3726124893512008
String1=dog, String2=dog food, Method=manhattan, Output=9.488634241251495
String1=dog, String2=chatgpt, Method=dot, Output=0.8203660566183042
String1=dog, String2=chatgpt, Method=cosine, Output=0.820366052379735
String1=dog, String2=chatgpt, Method=euclidean, Output=0.5993896037609885
String1=dog, String2=chatgpt, Method=manhattan, Output=15.415643926722199
=> ["cat", "asteroid", "rock fish", "submarine", "gemstone", "dog food", "chatgpt"]
irb(main):074:0>
HTH
Same thing, but change the order of the loops, for fun…
irb(main):076:1* methods.each do |method|
irb(main):077:2* compare.each do |phrase|
irb(main):078:2* Embeddings.test_strings({string1:"dog",string2:phrase,method:method}); sleep 5
irb(main):079:1* end
irb(main):080:0> end
String1=dog, String2=cat, Method=dot, Output=0.9164348051073471
String1=dog, String2=asteroid, Method=dot, Output=0.8246138517753474
String1=dog, String2=rock fish, Method=dot, Output=0.8351781225977196
String1=dog, String2=submarine, Method=dot, Output=0.8356327264031151
String1=dog, String2=gemstone, Method=dot, Output=0.8032192820692583
String1=dog, String2=dog food, Method=dot, Output=0.9305799323955518
String1=dog, String2=chatgpt, Method=dot, Output=0.8203660566183042
String1=dog, String2=cat, Method=cosine, Output=0.9164348102929122
String1=dog, String2=asteroid, Method=cosine, Output=0.8246138832545461
String1=dog, String2=rock fish, Method=cosine, Output=0.8351781121091175
String1=dog, String2=submarine, Method=cosine, Output=0.8356327585001753
String1=dog, String2=gemstone, Method=cosine, Output=0.803219292286573
String1=dog, String2=dog food, Method=cosine, Output=0.9305799640298726
String1=dog, String2=chatgpt, Method=cosine, Output=0.820366052379735
String1=dog, String2=cat, Method=euclidean, Output=0.4089459068501862
String1=dog, String2=asteroid, Method=euclidean, Output=0.592260263820192
String1=dog, String2=rock fish, Method=euclidean, Output=0.5741461311561782
String1=dog, String2=submarine, Method=euclidean, Output=0.5733537044205766
String1=dog, String2=gemstone, Method=euclidean, Output=0.6273447301289572
String1=dog, String2=dog food, Method=euclidean, Output=0.3726124893512008
String1=dog, String2=chatgpt, Method=euclidean, Output=0.5993896037609885
String1=dog, String2=cat, Method=manhattan, Output=10.538499585644807
String1=dog, String2=asteroid, Method=manhattan, Output=14.8466230262702
String1=dog, String2=rock fish, Method=manhattan, Output=14.63262434113021
String1=dog, String2=submarine, Method=manhattan, Output=14.554428136017805
String1=dog, String2=gemstone, Method=manhattan, Output=15.739283096362826
String1=dog, String2=dog food, Method=manhattan, Output=9.488634241251495
String1=dog, String2=chatgpt, Method=manhattan, Output=15.415643926722199
=> ["dot", "cosine", "euclidean", "Manhattan"]
One could argue that, in this data set, the Manhattan method is preferable; but it’s a matter of use case and preference, etc.