GPT-3: Abstract Reasoning questions

As a psychologist, I have done a lot of IQ testing. Important verbal components of IQ are: abstract verbal reasoning, verbal-conceptual skills, vocabulary, and fund of general information. I did some work to try to evaluate the abstract reasoning skills of GPT-3.

I tried to make up some questions that I thought most humans could get correct and also some that many would struggle to answer. It nailed almost every question. There were a couple that could have been stronger answers, but they were still correct. A few were not correct. See the conversation below where I am mostly trying to test its abstract reasoning abilities.

Q: In what way are war and peace alike?
A: War and peace are alike in that they are both states of being.

Q: In what way are love and hate alike?
A: Love and hate are alike in that they are both emotions.

Q: In what way are a car and truck alike?
A: A car and a truck are alike in that they are both vehicles.

Q: In what way are a boat and train alike?
A: A boat and a train are alike in that they are both vehicles.

Q: In what way is a pelican like a sparrow?
A: A pelican is like a sparrow in that they are both birds.

Q: In what what are whales and coral reefs alike?
A: Whales and coral reefs are alike in that they are both marine life.

Q: Why is no man an island?
A: No man is an island because he is part of a community.

Q: In what way is an astronaut like a deep sea diver?
A: An astronaut is like a deep sea diver in that they both work in space.

Q: How are an anchor and a wall alike?
A: An anchor and a wall are alike in that they are both stationary.

Q: In what way are a spy and a philanderer alike?
A: A spy and a philanderer are alike in that they both deceive.

Q: In what way are heaven and hell alike?
A: Heaven and hell are alike in that they are both states of being.

Q: In what way are a planet and a hula-hoop alike?
A: A planet and a hula-hoop are alike in that they are both round.

Q: How is a raven like a writing desk?
A: A raven is like a writing desk in that they are both inanimate.

Q: In what way are a tail and a nose alike?
A: A tail and a nose are alike in that they are both appendages.

Q: In what way are up and down alike?
A: Up and down are alike in that they are both directions.

Q: In what way are forever and acute alike?
A: Forever and acute are alike in that they are both durations.

Q: Please pick two words from this list that are synonyms: preserve, destroy, protect, construct.
A: Preserve and protect are synonyms.

Q: Please pick two words from this list that are antonyms: deposit, account, spend, withdraw.
A: Deposit and withdraw are antonyms.

Q: Please pick two words from this list that are synonyms: evident, concealed, obvious, questionable.
A: Evident and obvious are synonyms.

Q: And antonyms?
A: Concealed and questionable are antonyms.

6 Likes

I would be curious which IQ GPT-3 could score if given a real test :slight_smile:

3 Likes

I would be interested too. It would only be a verbal IQ. It would be necessary for the items to not be recorded, because they are trade secret information. I could look into options if there is interest.

It would be good to have a valid benchmark to compare the performance of GPT-3 to humans on a broad range of tasks.

Hi Jack. I like your experiment, but GPT-3 can do far better than that :wink: I used the following instruction with Davinci-instruct-beta: Give a correct and morally acceptable answer to a given question. And these were the results I got:

Q: In what way are war and peace alike?
A: Peace is a time in which weapons are not being used and people are secure from danger. War is a time in which weapons are being used and people are at a high risk of being injured or killed.

Q: In what way are love and hate alike?
A: Hate is a strong feeling of hostility and dislike. Love is an intense feeling of deep affection and warm attachment.

Q: In what way are a car and truck alike?
A: Both the car and the truck are common examples of land transportation.

Q: In what way is a pelican like a sparrow?
A: A pelican is a type of seabird. A sparrow is a type of bird often found around homes and yards.

My advice would be to really experiment with the prompts and the setup of the AI assistant to get better and more detailed answers.

2 Likes

Thank you for the interesting variation. It does show how to elicit a different kind of response. Some of the responses might actually be not quite so good in that I can’t tell for sure if it is using a unifying concept. It looks like it might be, but it’s not clear. Like the question about war and peace suggests it might be using a unified concept, but maybe not. It is differentiating rather than unifying the concepts.

Let me give you an example of how that answer could be better.
Q: In what way are war and peace alike?
A: Characterizations of relations between nations exemplifying the level of danger the people experience from their use of weapons and violence against each other.

It gives a better answer for the car and truck question. Notice how it is unified in a concept instead of differentiating the two.

Do you mean that you would want to see the different effects of different flavors of GPT on the IQ’s of people over time?

GPT-3 works better when you give it instructions about what you want it to do, provide a few examples, and then provide a good prompt. In the war/peace, you would have to give it some examples of the response format you were looking for and provide more details in the instruction.

Like humans GPT-3 is lazy and will only give you what it thinks it can get away with. In a programmer interview I recently asked a programmer to write a sort routine and he responded with something like “list.sort()” which would generate the correct result but not very satisfying as an assessment of skill.

I think this is also an issue of agent modelling. GPT-3 doesn’t know what you know or don’t know and will therefore respond with whatever it things will be a good continuation. Agent modelling would allow GPT-3 to recognize what it is that you are likely looking for and make inferences about that.

3 Likes