Generate multiple-choice questions MCQ tests

Dears, I’m trying to generate a some MCQ from a list of subjects.

The test should have 1 QUESTION and 4 ANSWERS, where just one is correct.

My prompt generate a good QUESTION and CORRECT_ANSWER but the problem is the INCORRECT_ANSWERS are obviously wrong, mostly because they are shorter in length or much simpler answers for the QUESTION.

I already tried many different approaches in the prompt, asking to generate similar (length, format, etc…) but no success.
Anybody had the same issue or have any idea to work around this issue?

All the best!

1 Like

Hi and welcome to the Forum!

Could you perhaps share with us the prompt you are using so we can determine what might be causing the issue?

Sure, the prompt is:

We are creating a multiple-choice question “MCQ” test in order to train new pilots to pass the EASA theory exam. The final goal of this task is to have 4 answers for the QUESTION I’ll provide, where 1 answer is correct and 3 answers are false.
Given the syllabus entry organized by DISCIPLINE and AREA.
Syllabus below between ‘’’
DICIPLINE: <discipline_variable>
AREA: <area_variable>‘’’
Think step-by-step:

  1. Use the DISCIPLINE and AREA to frame the educational context;
  2. The QUESTION is <question_variable>;
  3. Create one incomplete but CORRECT_ANSWER for the QUESTION;
  4. Output the CORRECT_ANSWER follow by a “|”;
  5. Using the CORRECT_ANSWER as model, modify it, adding a common misconception or misunderstanding to create 3 plausible INCORRECT_ANSWERS, ensure that each INCORRECT_ANSWER is similar in format to the CORRECT_ANSWER;
  6. Output the 3 INCORRECT_ANSWERS separated by a “|”.

Intuitively, I would try two things:

Include an additional instruction that speaks directly to the fact that the three incorrect answers must strictly follow the same style as the correct answer in terms of length, format, language etc. I understand you already did this but I would reinforce the wording in that regard. Sometimes this can make a difference.

If this still does not solve the issue, I would consider adding on top of the reinforced instruction an example set of question / answer pairs to the prompt to exemplify the specific style you are looking and how the model is expected to “transform” a correct answer into incorrect answer options.


I already tried to enforce the format but didn’t worked.
What I’m doing now is to ask to rewrite the CORRECT_ANSWER, using the same format and adding a misconception to the completion.

The prompt would be:
2. Analyze the CORRECT_ANSWER;
3. Your task is to rewrite the CORRECT_ANSWER, using the same format, adding one plausible misconception, in order to create the INCORRECT_ANSWER for the QUESTION.

I’m having better results with the format, nevertheless, the misconception sometimes are obvious wrong :wink:

1 Like

Hey, just saw this. I had the same problem with GPT taking some shortcuts and generating some questionable material. When I run the AI’s I have on disk, I specify the format I expect with this.

def extractConcepts(prompt: str, model="zephyr:latest"):
    SYS_PROMPT = (
        "Please craft detailed multiple-choice questions with 4 possible choices centering around each significant Named Entity Identified within the given text.\n"
        "In your creation, be as sucinct and concise as you can. Do not provide explanations, just the multiple-choice questions.\n"
        "Use format from below: \n"
        ' "Question": "Question"\n'
        "- A): Choice\n"
        "- B): Choice\n"
        "- C): Choice\n"
        "- D): Choice\n"
        "Answer: Answer"

They turn out about as short or as long as I want them. But when you have the AI on disk, the quality is significantly better. Here is a sample from my evolutionary psychology course.

Question: Which theory did Robert Trivers propose to explain altruistic behavior in terms of evolutionary biology?

  • A) Kin selection theory
  • B) Reciprocal altruism
  • C) Social exchange theory
  • D) Sexual selection theory

Not too shabby.