Measuring the similarity of test questions that are equivalent but presented with different scenarios or examples

erkanererkaner · September 1, 2023, 8:35pm

I want to see how similar a bunch of test questions (written in Turkish) are in terms of what they’re really testing, regardless of the situations or examples they use. It might sound a bit like checking how similar the sentences are, but it’s actually quite different. For example, these next questions should be seen as very similar or the same, even though the words in the sentences aren’t very much alike:

Question 1: You are given two balls, one made of rubber and one made of steel. You drop them simultaneously from the same height. Which one will hit the ground first, and why?

Question 2: In an experiment, you have two identical balls, one made of plastic and one made of glass. You release both at the same time from the identical height. Which ball will reach the ground first, and what physical principle explains the outcome?

So, I want to measure the similarity in terms of the similarity of the very specific skill measured by the questions. Is this possible?

PaulBellow · September 1, 2023, 9:35pm

Welcome to the forum.

Likely with a one-shot or two-shot… maybe even a zero-shot…

What have you tried so far?

anon22939549 · September 1, 2023, 9:55pm

This is called “factor analysis” or “item analysis.” And it’s related to Item Response Theory.

This is not something I would trust to an LLM if it is important to get correct results as it requires some moderately sophisticated statistical methods to perform.

Here are some resources;

And here’s some instructions on how you might perform a factor analysis:

Python

R

That should be enough get you started.

It is possible the LLM in conjunction with access to a computing environment will be able to perform this task exceptionally well, but I don’t think any degree of prompting the API will help the model get there on its own.

erkanererkaner · September 2, 2023, 10:41am

I have tried sentence-transformers and I am also investigating the github of OpenAI → openai/openai-cookbook/blob/main/text_comparison_examples.md

I thought of summarizing/refining the questions to be able to keep the essence of the question while removing the parts forming the specific example. I guess I may benefit from fine-tuning for this case.

Topic		Replies	Views
Fine tuning using a corpus API api	8	2093	July 13, 2023
GPT powered learning solution API api	21	2311	December 19, 2023
Compare 2 long texts using GPT API	6	5004	August 15, 2023
How to Find similarity between 2 sets of conversation Community api	6	5490	June 8, 2023
Use embeddings to measures how well an answer fits the question API embeddings	5	373	June 29, 2024

Measuring the similarity of test questions that are equivalent but presented with different scenarios or examples

Python

R

Related topics