Here what I mean by keywords. This is an example I extracted from chatGPT.
I’d like to extract these keywords and their relatively importance (frequency, TF-IDF or any importance metric like this) but considering only the documents in my dataset. In the example above, the frequency is estimated considering the whole dataset chatGPT was trained on.
It works really well. GPT is able to extract the keywords when prompted with a document.
Now that I have the keywords, how would we estimate the relevance of these keywords within my dataset? The relevance of these keywords in a bigger dataset (such as GPT’s dataset) does not solve my problem.