Can GPT-4 compare and list out contextual match data from 2 datasets?

I need a prompt that compares 2 sheets- and lists out the data in the 2nd sheet that are mentioned in the 1st sheet either in the exact same words or in the same context/meaning. With GPT-4 Advanced Data Analysis plugin, I was able to create a prompt that detects exact matches. But the AI couldn’t detect contextual matches.

AI added- “I’ll proceed by using fuzzy string matching to identify those attributes where exact or substring matches were not found. This step will allow us to capture the attributes that are similar in context but not exactly the same as the attribute names. The additional fuzzy string matching step is complete, and the results have been added to the DataFrame in a new column.”

But the final file had only the data that were an exact match and no contextual matches. Does this mean GPT-4 cannot detect/list contextual match data?
Kindly help me to create a prompt that can detect and list contextual match data from 2 datasets.

GPT are not good in counting things. Therefor you did the right thing by utilizing Advanced Data Analysis plugin.

I would use multiple GPT-3.5 agents with smaller tasks for that.

I know exactly what you need -

MAXQDA two week free trial - It can do a ton of stuff which can take a lot of time to learn but the one thing which it is extremely good at and which is also super simple to learn and setup is language / words / word combinations, etc

You can run a search on a document to find the most frequently used words and then pick through those to create a custom dictionary.

You can literally dump hundreds of PDF’s into it and run searches pretty fast. Also allows you to create different ‘groups’ ‘documents sets’ for various methods.

Also generates nice looking word clouds as a visual output.

THEN - you use those words as gpt4 input for specific prompts/analysis/etc.

What is this plugin about? What algorithm is used by this plugin?

Instead of relying solely on the GPT-4 model, I suggest adopting a more traditional NLP approach, which involves cleaning and preprocessing the text first, then transforming it into embeddings, followed by employing similarity measures such as cosine similarity or FuzzyWuzzy for string comparison. Utilizing the GPT embedding capabilities model (text-embedding-3-small) in this workflow has yielded promising outcomes for a similar challenge.

