I am testing the new embedding models (text-embedding-3-small, text-embedding-3-large) and the result is very different from text-embedding-ada-002. I am not sure what to make of it.
Query
what is my favorite tom cruise movie?
Text1
My all time favorite action movie is Top Gun. I used to have a movie poster of Tom Cruise from Top Gun hanging in my room.
Text2
One of my favorite comedy movies is Tropic Thunder. It’s hilarious, especially with RDJ and Tom Cruise. The way Tom Cruise danced to the song Low was just epic!
Text3
Daruma is a delightful restaurant that serves yakiniku, Japanese style grilled meats, and jingisukan, a grilled lamb dish which is a Hokkaido delicacy. The food is not only delicious but also affordable.
Results
ada small large
Text1 0.8526715878737584 0.5631829592887835 0.594466374201587
Text2 0.8326902960730671 0.5492379659408032 0.5156284404991819
Text3 0.6683329263351143 0.008682087570941398 0.05392372968797883
My threshold is around 0.7 so I will not get result.
I was thinking, maybe my query is not exact enough. So I tried "what is my favorite action movie?" and got the following for the small model.
small
Text1 0.5733264320688367
Text2 0.4358654870000338
Text3 0.09655090917844694
Yeah the new models have a completely different spread. with ada, we almost never saw orthogonality (i.e. cosim approaching 0), now we have that with the new ones, which I think is great.
I think even less. It depends on how specialized your queries are, and if “neutrino physics” would match 100 documents specifically about that.
Here’s results of a user query about GPTs, against various GPT blog and embeddings docs chunked to around two paragraphs.
“Top-n in X tokens” on top of a reject threshold would work best.
== Cosine similarity ==
0:"How to add documents to my own" <==> 0:"How to add documents to my own":
1.00000000000000000000 - identical: True
1:"[1] Documentation does not sup" <==> 0:"How to add documents to my own":
0.19778059389735827556 - identical: False
2:"[2] Both of our new embeddings" <==> 0:"How to add documents to my own":
0.13452488168373297195 - identical: False
3:"[3] We’re rolling out custom v" <==> 0:"How to add documents to my own":
0.43988936843141823729 - identical: False
4:"[4] GPTs let you customize Cha" <==> 0:"How to add documents to my own":
0.45278086198655564942 - identical: False
5:"[5] The GPT Store is rolling o" <==> 0:"How to add documents to my own":
0.44064855357381010892 - identical: False
6:"[6] We’ve set up new systems t" <==> 0:"How to add documents to my own":
0.36558419871885522445 - identical: False
7:"[7] Developers can connect GPT" <==> 0:"How to add documents to my own":
0.39966454268300044550 - identical: False
8:"[8] Since we launched ChatGPT " <==> 0:"How to add documents to my own":
0.43432215026683346215 - identical: False
9:"[9] We want more people to sha" <==> 0:"How to add documents to my own":
0.30948992336293501548 - identical: False
10:"[10] Creating a GPT
How to cr" <==> 0:"How to add documents to my own":
0.55752453059361806176 - identical: False
11:"[11] Here’s how to create a GP" <==> 0:"How to add documents to my own":
0.55959997969812447227 - identical: False
12:"[12] Advanced Settings
In the" <==> 0:"How to add documents to my own":
0.49851405229663103835 - identical: False
13:"[13] Settings in the Configure" <==> 0:"How to add documents to my own":
0.46590037610635792742 - identical: False
14:"[14] FAQ:
Q: How many files " <==> 0:"How to add documents to my own":
0.46340230123865322476 - identical: False
Granted, nothing I included is “to add your own documents…”; here’s what crossed the threshold of 0.5.
Part 10 and 11 of 14 are >0.5
%%
Creating a GPT
How to create a GPT
GPTs are custom versions of ChatGPT that users can tailor for specific tasks or topics by combining instructions, knowledge, and capabilities. They can be as simple or as complex as needed, addressing anything from language learning to technical support. Plus and Enterprise users can start creating GPTs at chat.openai.com/create.
%%
Here’s how to create a GPT:
Head to https://chat.openai.com/gpts/editor (or select your name and then “My GPTs”)
Select “Create a GPT”
In the Create tab, you can message the GPT Builder to help you build a new GPT. You can say something like, "Make a creative who helps generate visuals for new products" or "Make a software engineer who helps format my code."
To name and set the description of your GPT, head to the Configure tab. Here, you will also be able to select the actions you would like your GPT to take, like browsing the web or creating images.
When you’re ready to publish your GPT, select “Publish” and share it with other people if you’d like.
Yes, your are right. After adding more data and asking many questions, from vague to direct, a good cutoff is around 0.3. Even if the results would include irrelevant data among the bunch, the final chat completions API call is still able to pick the correct answer.