Can GPT-4 compare and list out contextual match data from 2 datasets?

I need a prompt that compares 2 sheets- and lists out the data in the 2nd sheet that are mentioned in the 1st sheet either in the exact same words or in the same context/meaning. With GPT-4 Advanced Data Analysis plugin, I was able to create a prompt that detects exact matches. But the AI couldn’t detect contextual matches.

AI added- “I’ll proceed by using fuzzy string matching to identify those attributes where exact or substring matches were not found. This step will allow us to capture the attributes that are similar in context but not exactly the same as the attribute names. The additional fuzzy string matching step is complete, and the results have been added to the DataFrame in a new column.”

But the final file had only the data that were an exact match and no contextual matches. Does this mean GPT-4 cannot detect/list contextual match data?
Can GPT-4 compare and list out contextual match data from 2 datasets?
Kindly help me to create a prompt that can detect and list contextual match data from 2 datasets.

GPT are not good in counting things. Therefor you did the right thing by utilizing Advanced Data Analysis plugin.

I would use multiple GPT-3.5 agents with smaller tasks for that.

I know exactly what you need -

MAXQDA two week free trial - It can do a ton of stuff which can take a lot of time to learn but the one thing which it is extremely good at and which is also super simple to learn and setup is language / words / word combinations, etc

You can run a search on a document to find the most frequently used words and then pick through those to create a custom dictionary.

You can literally dump hundreds of PDF’s into it and run searches pretty fast. Also allows you to create different ‘groups’ ‘documents sets’ for various methods.

Also generates nice looking word clouds as a visual output.

THEN - you use those words as gpt4 input for specific prompts/analysis/etc.

I was able to establish a clear and obvious case of academic fraud involving ‘accredited’ publishing sites posting back dated papers whose images were generated using a CNN model even though the claimed publishing date was 2006…

I essentially ‘plucked out’ all of the words from the fraudulent papers which were relevant to the reason the fraud was being carried out in terms of the false ‘prior art’ that was being perpetrated and then compared them to the 3 granted ‘Light Stage’ patents and was able to obtain a list of very obvious ‘critical elements’ which are used in either one or both of the claimed 2006 papers but NEVER one single time in any of the three granted patents.

The fraud I speak of actually has some rather profound (as in massive…) implications in terms of the credibility of the web archive among many other well known and ‘trusted’ sources of information. It involves me being unlucky enough apparently to file a patent for the exact same thing as Netflix - except 12 days earlier - and them thinking it would be a good idea to try and steal my patent instead and me catching all of it and publicly calling them out.

I have been using chatGPT for only a couple weeks maybe and it has literally allowed me to clearly establish and present all of the fraud as well as the many other things I have been put through for nearly one year at this point due to the fact that they’ve invested at least a half billion by now but probably much more in total now that I have learned it also involves the US military and combat training simulations - which might explain why one of the fraudulent papers was being hosted on an army dot mil domain.

It’s a crazy story - and it literally all started after I signed up for chatGPT4 to have it help me assemble a business plan and then just kept asking it more and more questions. Check out the post ‘InfiniSet CEO triess chatGPT for first time’

After this crap is over I will be able to sell my story for a movie - just not to Netflix though. haha. It is completely surreal. I even had a ‘standoff’ with police and SWAT, speent 4 days in jail on felony charges, just barely avoided being thrown in a mental institution, while at the same time trying to start my company, being granted a patent, designing, engineering, fabricating, and programming a working prototype, etc.

They never expected me to even notice muchless collect and analyze and proove and make public for all to see calling out the entire system essentially

My patent is US11577177B2 - motorized rotatable treadmill and system which is precisely controlled by servo motors using data from Unreal Engine consisting of the transformation matrix data for character and camera - instead of orbiting the camera around the user you instead simply rotate the user. They rotate inversely to one another - treadmill belt speed and distance as well as rotation is all precision calibrated. It is a 3d virtual positioning system…you guys are smart…you get it

MattGuertin_Substack_com
MattGuertin_com

What is this plugin about? What algorithm is used by this plugin?

Instead of relying solely on the GPT-4 model, I suggest adopting a more traditional NLP approach, which involves cleaning and preprocessing the text first, then transforming it into embeddings, followed by employing similarity measures such as cosine similarity or FuzzyWuzzy for string comparison. Utilizing the GPT embedding capabilities model (text-embedding-3-small) in this workflow has yielded promising outcomes for a similar challenge.

1 Like