Problem identifying "part of" relationships

Hello,

I’m struggling to make this prompt work:

You will be given a list of job category pairs (A and B). and for each of them, you will:

  • answer 1 if A is equivalent to B.
  • answer 2 if A is a part of B.
  • answer 3 if B is a part of A.
  • answer 4 if A and B have no relationship.
    And you will add a sentence to explain your reasoning.

Then I added an example of input/output to the conversation:

A: Merchant, B: Shopkeeper
A: Education, B: Teaching

1: Merchant and Shopkeeper are actually synonyms
3: Education contains Teaching

But when I try the following query:

A: Finance, B: Trading
A: Accounting, B: Finance

Here is what GPT4 replies:

2: Trading is a part of Finance.
3: Finance is a part of Accounting.

The explanations are good but the labels are reversed, the answers should be 3 and 2. The LLM consistently misses the right direction of the inclusion relationship. How would you make it work?

NB: The naming of elements and the explanation part were just an attempt to improve the results.

I was lucky twice with this slightly amended version but the answer indicates that the model definitely struggles a bit with the task:

User prompt:

You will be given a list of job category pairs (A and B). You role is to determine which of the following cases 1-4 applies, strictly paying attention to the order in which the pairs are provided. Your response must include a justification for your choice.

Case 1: A is equivalent to B.
Case 2: A is a part of B.
Case 3: B is a part of A.
Case 4: A and B have no relationship.

Example 1:
A: Merchant, B: Shopkeeper
A: Education, B: Teaching

1: Merchant and Shopkeeper are actually synonyms
3: Education contains Teaching

New pair for evaluation:
A: Finance, B: Trading
A: Accounting, B: Finance

ChatGPT response

For the new pairs provided:

  1. A: Finance, B: Trading
  • Case 2: A is a part of B. This is not accurate; instead, Trading is a part of Finance. Finance is a broad field that includes various activities related to money management, including trading, investments, lending, and budgeting. Trading refers specifically to the buying and selling of assets, such as stocks, bonds, or commodities, and is therefore a subset of finance. The correct case should be Case 3: B is a part of A.
  1. A: Accounting, B: Finance
  • Case 3: B is a part of A. This is incorrect; Accounting is actually a part of Finance. Finance is a broad category that encompasses several sub-disciplines, including accounting, investments, and financial planning. Accounting focuses on the systematic recording, reporting, and analysis of financial transactions. Therefore, the correct case is Case 2: A is a part of B.

Summary:

  • For Finance and Trading, the correct evaluation is Case 3: B is a part of A (Trading is a part of Finance).
  • For Accounting and Finance, the correct evaluation is Case 2: A is a part of B (Accounting is a part of Finance).

Thank you very much for looking into this disturbing issue! So, I tried your prompt and although I don’t have the same explanations, it worked (maybe you use ChatGPT 3.5):
3: Trading is a part of Finance
2: Accounting is a part of Finance

It seems that the problem came from the way I was sending the prompt and examples: the first part of the prompt in a system message, than the example input in a human message, the example output in the AI message (I read on a forum that it was the best way to provide examples), and finally a another human message with the query. But sending everything in one go as you did gives better results…

1 Like

If you want the highest quality, you need the AI to show itself its reasoning before deciding.

Solution identifying “part of” relationships

system:

You are an AI that specializes in career comparison. You will be presented with pairs of job categories (A and B). Your task is to analyze these pairs and categorize their relationship using a specific enumeration system. Here’s how you should proceed:

Your output should be structured as follows:

Step 1: Initial Comparisons: write out a boolean [True|False] evaluation table for each of these conditional statements:

  • 1: is A is equivalent to B.
  • 2: is A is a part of B.
  • 3: is B is a part of A.
  • 4: are A and B without relationship.

Step 2: Problem-solving Discussion - Detail your thought process and the steps you took to arrive at your conclusions, including any comparisons or contrasts you considered. You must now choose the single boolean that is most true.

Step 3: Final Output JSON, with two keys, where the values are:

  • “choice” (number): the number selected as the best option;
  • “justification” (string): a sentence explaining your reasoning behind the chosen number.

Final step: Repeat for each list item.

Remember, your goal is to provide the most accurate comparison based on the information given. Engage in a step-by-step planning process to ensure a well-organized and precise response.

That prompt in a code block
You are an AI that specializes in career comparison. You will be presented with pairs of job categories (A and B). Your task is to analyze these pairs and categorize their relationship using a specific enumeration system. Here's how you should proceed:

Your output should be structured as follows:

Step 1: Initial Comparisons: write out a boolean [True|False] evaluation table for each of these conditional statements:
   - 1: is A is equivalent to B.
   - 2: is A is a part of B.
   - 3: is B is a part of A.
   - 4: are A and B without relationship.

Step 2: Problem-solving Discussion - Detail your thought process and the steps you took to arrive at your conclusions, including any comparisons or contrasts you considered. You must now choose the single boolean that is most true.

Step 3: Final Output JSON, with two keys, where the values are:
 - "choice" (number): the number selected as the best option;
- "justification" (string): a sentence explaining your reasoning behind the chosen number.

Final step: Repeat for each list item.

Remember, your goal is to provide the most accurate comparison based on the information given. Engage in a step-by-step planning process to ensure a well-organized and precise response.

user:

> 
> Evaluate:
> [
>   "A: Finance, B: Trading",
>   "A: Accounting, B: Finance"
> ]

GPT-4-0613 response:

For the first pair “A: Finance, B: Trading”:

Step 1: Initial Comparisons:

  • 1: is A is equivalent to B: False
  • 2: is A is a part of B: False
  • 3: is B is a part of A: True
  • 4: are A and B without relationship: False

Step 2: Problem-solving Discussion:
Finance is a broad field that includes many different aspects, including trading. Trading is a specific activity within the finance sector, where financial instruments are bought and sold. Therefore, trading is a part of finance, but finance is not equivalent to trading, nor is it a part of trading. They are not without relationship, as trading is a subset of finance.

Step 3: Final Output JSON:
{
“choice”: 3,
“justification”: “Trading is a specific activity within the broader field of finance.”
}

For the second pair “A: Accounting, B: Finance”:

Step 1: Initial Comparisons:

  • 1: is A is equivalent to B: False
  • 2: is A is a part of B: True
  • 3: is B is a part of A: False
  • 4: are A and B without relationship: False

Step 2: Problem-solving Discussion:
Finance is a broad field that includes many different aspects, including accounting. Accounting is a specific discipline within the finance sector, where financial transactions are recorded, summarized, analyzed, and reported. Therefore, accounting is a part of finance, but finance is not equivalent to accounting, nor is it a part of accounting. They are not without relationship, as accounting is a subset of finance.

Step 3: Final Output JSON:
{
“choice”: 2,
“justification”: “Accounting is a specific discipline within the broader field of finance.”
}

1 Like

Thanks, your prompt seems to give great results, but it generates a much verbose output which would result in prohibitive additional costs if used with GPT4. Of course, the example I gave was a toy example, in real life there will be possibly a hundreds of pairs to assess.

How come that the LLM couldn’t answer straightaway to such basic questions without using the “chain of thought” technique? It seems that it hasn’t been trained enough on such questions.

how much would it cost if you gave the task to humans?

Are these really such basic questions?

If I snapped my fingers every half second and gave you a word, and asked you to answer by the third snap, could you?

snap
Merchant
snap
Shopkeeper
snap
[your answer]
snap
Trading
snap
Accounting
snap
[your answer]
snap

Can you do that? I can’t.

While LLMs are amazing, they’re not magic oracles. You need to give them the opportunity to think.

3 Likes

Your comparison with humans is irrelevant, costs have to be taken into account in any project. Of course, the questions I asked in my prompt were not basic per se, but basic with regards to what LLMs are able to solve in general. They can solve complex math problems by detailing the different steps, but they cannot say whether a concept is part of another… Puzzling!

:face_with_monocle:

:thinking:

:laughing:

:person_shrugging:

Well, good luck! You could try opus and see if that works better for you :slight_smile: