I want to see how similar a bunch of test questions (written in Turkish) are in terms of what they’re really testing, regardless of the situations or examples they use. It might sound a bit like checking how similar the sentences are, but it’s actually quite different. For example, these next questions should be seen as very similar or the same, even though the words in the sentences aren’t very much alike:
Question 1: You are given two balls, one made of rubber and one made of steel. You drop them simultaneously from the same height. Which one will hit the ground first, and why?
Question 2: In an experiment, you have two identical balls, one made of plastic and one made of glass. You release both at the same time from the identical height. Which ball will reach the ground first, and what physical principle explains the outcome?
So, I want to measure the similarity in terms of the similarity of the very specific skill measured by the questions. Is this possible?
It is possible the LLM in conjunction with access to a computing environment will be able to perform this task exceptionally well, but I don’t think any degree of prompting the API will help the model get there on its own.
I have tried sentence-transformers and I am also investigating the github of OpenAI → openai/openai-cookbook/blob/main/text_comparison_examples.md
I thought of summarizing/refining the questions to be able to keep the essence of the question while removing the parts forming the specific example. I guess I may benefit from fine-tuning for this case.