For Generate MCQ with question, option a, option b, option c, option d, correct answer, correct answer xaplanation, bloom level, diffciulty level, complexiety level etc... like 12 columns are present

“I have a PDF with 20 pages and need to generate at least 50 multiple-choice questions with 12 columns each (including questions, correct answer, and options a, b, c, d etc…). Using the OpenAI API with GPT models such as GPT-3.5 (model 1106), I can only generate 10 questions with their options, despite having a larger context length available. How can I efficiently generate the desired number of questions and options?”

the prompt is:

def generate_mcqs(self):
try:
text = self.load_documents()
if not text:
return None
parser = JsonOutputParser(pydantic_object=Mcq)
model = “gpt-4-turbo-2024-04-09” # Or
#model = “gpt-3.5-turbo-1106”
#model = ‘gpt-4-turbo’

        model = ChatOpenAI(model=model, temperature=0)
        
        system_message= '''Generate  multiple-choice questions

        Each MCQ should include four options, with one correct answer  where “None of the above” and “all of the above” should not be an option.
           Also, provide an explanation for why the chosen answer is correct.

            Each question should be categorized according to the following criteria:

            a) Difficulty Level: Easy, Medium, or Hard.
            b) Complexity Level: P1 (single knowledge point), P2 (two knowledge points), or P3 (three or more knowledge points).
            c) bloom's taxonomy level: Understand, Apply, Analyze, etc.
            d) Divide and create three questions of  P1, P2 and P3 each where P1 means only one learning outcome, P2 means  > 1 and <=2 learning outcomes and Knowledge points clubbed together 
            and P3 means > 2 and <=3  learning outcomes  and knowledge points clubbed together.
            e) Create questions in the Indian context.
            f) Include multiple distractors that reflect common mistakes, or misconceptions. Make sure that each option tests a different aspect of understanding, and all of them look plausible to increase the complexity.
            g) Questions should test teacher's analytical thinking, computational thinking, and logical thinking skills along with the knowledge so add such components in each question
            h) Be smart in asking questions and try not to repeat concepts in questions
            f) Must integrate multiple knowledge points and learning outcomes into a single problem use outcomes from  a given list to increase the complexity of the question for at least 60% of questions out of total questions

            The output must be formatted in JSON, including fields for the question, options (a, b, c, d), correct answer, answer explanation, complexity level, bloom's taxonomy level, and difficulty level.
            Provide the assessment in the following JSON format:
                {
               "question": "",
                "option_a": "",
                "option_b": "",
                "option_c": "",
                "option_d": "",
                "correct_answer": "",
                "answer_explanation": "",
                "bloom_taxonomy_level": "",
                "complexity_level": "",
                "difficulty_level": ""
                }
        '''
        chat_template = ChatPromptTemplate.from_messages(
            [
                SystemMessage(
                    content=(
                    system_message
                    )
                ),
                HumanMessagePromptTemplate.from_template("You must Generate {num_questions} multiple-choice questions for grade: {grade} and subject: {subject} using {text}"),
            ]
        )


        print("......Model is processing.........")

        chain = chat_template | model | parser

        
        with get_openai_callback() as cb:
            results = chain.invoke({"num_questions":self.num_questions,"grade":self.grade, "subject":self.subject, "text": text})
            print(cb)
        return results
    
    except Exception as e:
        print(f"Error generating MCQs: {e}")
        return None

input is pdf
developement: Langchain>>

The short answer is that it is not possible to achieve this.

For tasks of this level of complexity (i.e. generate a question, generate answer options AND provide an explanation/rationale for the correct answer, classify the question), the model requires time to think.

Even 10 questions at a time is a gamble and you want to make sure that it is in fact generating correct outputs.

Hence, your best option is to repeat the process until you have reached the desired number of questions.

Also, just to rule out any misunderstandings. The models are limited in terms of the output token they can produce. This limit is set to 4,096 tokens (in practice though they normally return only a smaller amount). This is different from the context window which is the limit of the sum of both input tokens and output tokens that can be processed in a single API call.

I hope this helps.

Thanks for reply.
I knew ouput token limit for GPT4 model is 4096 token.

here completion ouput token around 700 tokens
Prompt + input file (pdf) took around 6000 tokens.

my questions is
As i asked before, how to generate more 50 mcq questions for provided pdf.

It’s not possible to create such a volume of questions in a single API call. The reason is that the model requires a certain amount of time and effort to produce any such question at the level of detail you are asking it to.

Thanks jr.2509,
a) How to generate volume of questions using text ( pdf file)?
b) i) Let suppose for one batch 10 questions generated i.e. using one API call. Then how batches to generate volume of question. ii) If we succed for it. duplication of questions may occur, then how to solve it.
c) I want a good solution for above problem which i stated earlier.

As I am not familiar with your input text, I can only make a few assumptions here. Two options I can think of to reduce / eliminate the risk of duplicate questions:

Option 1
You chunk / divide the text and just place one chunk as input for API call as the basis for formulating the questions.

Option 2
Starting with the second API call, you include the questions that have already been generated (just the questions, not all the other details) as additional context into your prompt as well as expand your prompt with the additional instruction to generate questions that are different from those already generated.

Good luck!