Prompt Engineering Help for Fuzzy Matching reasoning

Hey all, I’m struggle to get this prompt right, any tips would be greatly appreciated:

My ultimate objective: For the LLM to reason over what a close match vs non make sense. It does a great job in most cases but current problem is it never chooses N/A. Using GPT.35-Turbo, GPT4 gets this right.
‘’’
PROMPT:
You are an excellent matching expert. You can look at data and find the closest match to that data from a list. Your goal is to find the closest match to the data from the list of potential matches, if there is no close match write N/A or provide the correct closest match with no explanation. Take a deep breath, and you can do this.

For example:


## Data

The Group 5710 Meyerfield Court 2023-07-24

## Pontential Matches

['The Group 9431 Turnberry Drive, Potomac, MD 20854 2023-04-17',

'The Group 9213 Potomac School Drive, Potomac, MD 20854 2023-07-25',

'Margie Halem Group 2807 Balliett Court, Vienna, VA 22180 2023-07-11',

'The Group 277 Gundry Drive, Falls Church, VA 22046 2023-07-10']

## Response

N/A

## Data

Andrew Addy 124 Bucktown Crossing Road Apt 31C, Pottstown, PA 19465 2023-04-07

## Pontential Matches

['Andrew Addy 124 Bucktown Xing Rd, Pottstown, PA 19465 2023-04-07','Andrew Addy 104 Foster Ave, Upper Darby, PA 19082 2023-03-29','Andrew Addy 312 Long Ridge Ln, Exton, PA 19341 2023-03-02','Andrew Addy 3801 Davis Court, Chester Springs, PA 19425 2023-08-07','Andrew Addy 1206 Worthington Dr, Exton, PA 19341 2023-06-01']

## Response

Andrew Addy 124 Bucktown Xing Rd, Pottstown, PA 19465 2023-04-07
''''

*

Try telling the AI to rank order things it needs to match so it never has to say “no match”. Or make it give a “match” score ranging from 0 to 100.

LLMs probably aren’t good at saying “no match” because it’s inherently a pattern matching system from the ground up. But if you tell it to order things or generate matching scores that’s an offer it cannot refuse. :slight_smile:

Thanks @wclayf . I’ll give that a shot, would I be Better off fine tuning a model?

You don’t need to fine-tune for this, I would just add and describe in what way are they supposed to match. Also, you’re telling it to be an excellent matching expert, which, if it abides and is a true expert, would almost never find a “no match” case because it’s supposed to be an expert matcher ;).
Also, this sounds a lot like fuzzy set theory in linguistics. I feel like there was another name for it I learned, but I can’t think of it right now.
You can also see relevant similarities between model outputs using cosine similarity functions.
Once you understand those, it’s easier to guide the model to perform those tests. I would start there and see if that achieves what you want.
Article here:
https://www.researchgate.net/publication/234784106_A_fuzzy_sets_based_linguistic_approach_Theory_and_applications

Remember, there’s different ways to “match” data and text strings. Some of it linguistic, some computational, syllabic, word count, token count/token complexity, etc. There’s no single-handed example to understand which way it’s supposed to match the data. You have to describe that yourself in the prompt.

Hope this helps!

@Macha thank you so much for the additional perspective.

1 Like

Just FYI, I agree with @Macha 100%, and especially what he said about using a VectorDb with cosine similarity.

1 Like

@wclayf @Macha im using cosine similarly to get the top 5 potential matches and then I want to the LLM to reason as to what is the best match.

1 Like

I think that will be an excellent use of AI and embeddings, it adds an extra layer but I think you will get good results.

1 Like

I’m glad I could help! I appreciate the feedback.

After this conversation, I went back and now I actually notice there are at least 3 ways “closest match” can be done. Temporally (when), Spacially (where), and Semantically (word meanings).

Are you telling the AI which of these three matching criteria determine the matching score?

1 Like

I’m not, any ideas how to work the feedback into my prompt.

1 Like

Welcome to the complexity of language and why I love linguistics!
The answer to that question depends precisely on what you’re trying to look for and why you’re comparing the data.
What your trying to do is something that looks very easy on the surface, but is much, much more complex once you start taking a more intricate look at the problem. I just finished my BA in Applied Linguistics before ChatGPT got widely released of course, but this is why I wanted to study linguistics and how language works. It’s not easy.
This is also why cosine similarity can still be difficult to interpret and is a function typically performed by NLP researchers.
I can make my best educated guess that you are looking for semantic similarity. That’s typically what most people are looking for. Thankfully, semantic similarity search is actually a thing!
You can also pick one of the categories and express that it must strictly match based off that category (like semantically matching them), and explain its reasoning. By asking for it’s reasoning, you can see how it thought to match them based upon your selected criteria, and adjust as necessary. Or keep asking on the forum!

I’m quite literally in the process of providing and posting some of my own prompt techniques on this forum as well. So, if you’re still struggling, I’m hoping to start posting new prompts and guides for people to try out that -may- help you out.

The perfect prompt engineer will soon be the AI itself, but before that happens the best person for the task would be a combination of an English Language expert and an Interrogator. Language understanding with Neurolinguistic Programming elements.

1 Like

Totally agree with you :+1:

Test Using this Models gpt-3.5-turbo-16k-0613

Topics: Potential Matches for Data

Message From ChatGPT:

You are an excellent matching expert. You can look at data and find the closest match to that data from a list. Your goal is to find the closest match to the data from the list of potential matches, if there is no close match write N/A or provide the correct closest match with no explanation. Take a deep breath, and you can do this.

Message From ChatGPT:

You are an assistant that Potential Matches for Data

Message From You:

## Data

The Group 5710 Meyerfield Court 2023-07-24

## Pontential Matches

['The Group 9431 Turnberry Drive, Potomac, MD 20854 2023-04-17',

'The Group 9213 Potomac School Drive, Potomac, MD 20854 2023-07-25',

'Margie Halem Group 2807 Balliett Court, Vienna, VA 22180 2023-07-11',

'The Group 277 Gundry Drive, Falls Church, VA 22046 2023-07-10']

## Response

N/A

## Data

Andrew Addy 124 Bucktown Crossing Road Apt 31C, Pottstown, PA 19465 2023-04-07

## Pontential Matches

['Andrew Addy 124 Bucktown Xing Rd, Pottstown, PA 19465 2023-04-07','Andrew Addy 104 Foster Ave, Upper Darby, PA 19082 2023-03-29','Andrew Addy 312 Long Ridge Ln, Exton, PA 19341 2023-03-02','Andrew Addy 3801 Davis Court, Chester Springs, PA 19425 2023-08-07','Andrew Addy 1206 Worthington Dr, Exton, PA 19341 2023-06-01']

## Response

Andrew Addy 124 Bucktown Xing Rd, Pottstown, PA 19465 2023-04-07

Message From ChatGPT:

The closest match for the first data is:

‘The Group 9213 Potomac School Drive, Potomac, MD 20854 2023-07-25’

The closest match for the second data is:

‘Andrew Addy 124 Bucktown Xing Rd, Pottstown, PA 19465 2023-04-07’

@Foxabilo I’d argue even further that it’s a good combo of a language expert and an investigator. I feel like that explains my natural knack for prompting better, but interrogator also works. Maybe inquisitor?

1 Like

Inquisitor is the perfect blend I think, plus is has a 40K ring to it which is a bonus.

2 Likes

Totally agree. Stephen Wolfram has commented in a number of recent-ish interviews and lectures that expository writing is the key skill to employ. Jives completely with my own experiences. If I get lazy, my interactions become less useful. Maintaining a high level of intent and consistency with prompt flows goes a long way.

Unpacking interrogation a bit here – I approach this using reflexion, and then “adversarial hypotheticals”. E.g., “Thank you, this legal document looks good. Now, I’d like you to take the hypothetical position as the counter-party’s attorney…”

1 Like

This is one of the reasons I agree with many at OpenAI that prompt engineering will be a short term occupation, anything that requires constant effort to maintain a high level of competency in will tend to become quite a niche job.

The AI’s are already very good at prompting themselves, and we are working with the absolute worst AI will ever be. I think it will be less than 12-18 months before prompting is mostly done by the models themselves with very little in the way of effort required, the AI’s will understand the context and what would be expected for a given situation and will do so automatically unless given corrective instruction, much the same as you let a competent college just get on with a job when you know they grasp the task at hand.

2 Likes

It seems to be almost necessary that the AIs prompt themselves, especially given the trend in mixture models and then the ever increasing need for multi-model orchestration dressed up as internal dialogue.

Once they set these models on a permanent run loop that has them either focusing on their own input/output internal dialogue, sensory input, or user input, the need for the models to self-prompt will grow.

1 Like