How could I use GPT-3 API to find the frequency of keywords (including their synonyms and variations) within my own dataset?
Some more details. I have dataset of 24k documents, where each document has about 200 words. I’d like to extract the most frequent keywords of this dataset. How could I do that using GPT-3 API?
Here what I mean by keywords. This is an example I extracted from chatGPT.
I’d like to extract these keywords and their relatively importance (frequency, TF-IDF or any importance metric like this) but considering only the documents in my dataset. In the example above, the frequency is estimated considering the whole dataset chatGPT was trained on.
It works really well. GPT is able to extract the keywords when prompted with a document.
Now that I have the keywords, how would we estimate the relevance of these keywords within my dataset? The relevance of these keywords in a bigger dataset (such as GPT’s dataset) does not solve my problem.