Fine Tuning text-davinci-003 Model for Quizzes

By creating a jsonl with hundrets of entries like:

{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is the hallmark of the third industrial revolution?;the invention of the microchip[true];the invention of the telephone[false];the invention of computers[true];the invention of the camera[false]”}
{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is meant by the Internet of Things?;The networking of everyday objects via the Internet.[true];People search for things on the Internet[false];People surf the Internet and call their PC a thing[false]”}
{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is the IoT (Internet of Things) compared to the WWW (world wide web)?;IoT is an extension to WWW[true];WWW replaces IoT[false];IoT is the opposite of WWW[false];WWW is a synonym for IoT[false]”}

and creating a finetuned model with it I I thougt I can generate Quiz-Questions in exactly the structure which is the same in all the hundrets of examples.

But what I get by questioning “Generate 1 quiz questions on Industry 4.0” is

What is a blockchain?;A blockchain is a database that is set up on different computers in the form of an account block system.[true];A blockchain is a data set that is protected by an encryption technique.[false];A blockchain is a database that is set up on a blockchain server.[false];A blockchain is a database that is backed up with a blockchain backup.[false]true;false;false;false

What is a blockchain?;These are records that are protected with an encryption technique.[false];A blockchain is a database that is set up on a blockchain server.[false];A blockchain is a database that is stored in the form of an account block system on various servers.

I got two quiz-questions and not 1 as asking for. The structure is often wrong (end of the first question). Sometimes several semicolons are used for delimiting. Sometimes it is missing whether the answer is correct or incorrect [true] or [false] (second question).

Am I comletely wrong with what I am doing?
I thought a fintuned model could at least help me define how responses are formatted.

another result:

Generate 20 quiz questions on Industry 4.0What is the hallmark of the third industrial revolution?;the invention of the computer[true];the invention of electricity[false];the invention of the telephone[false];the invention of the automobile[false]truetruefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsetruefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalse

Has anyone a tip how to create the training data to get better results? Or a working example?


Take a look at this lab-based tutorial:

1 Like

Okay, I have implemented the n_epoches and the structure (ending promt with ++++ and completion with ++++,starting completion with a blank,…) from your tutorial. The result:

Create 3 quiz questions on the topic of Industry 4.0

What is Big Data?

Ideas that can only be generated in large companies.[false];Data that is too large to be read into common analysis tools.[true];Data that has been digitised.[false];Data that is analysed by means of software solutions.[false]

What is a chatbot?

A chatbot is an artificial intelligent being that is able to chat with a human.[true];A chatbot is a computer program that performs various tasks.[false];A chatbot is an AI solution based on cloud computing.[false];A chatbot is a computer-controlled parrot.[false];Data that has been digitised.

What is a digital twin?

A digital twin is a virtual identity of a real person.[false];A digital twin is an application that collects and analyses information in a digital business process chain.[true];A digital twin is a 3D image analysis of a real person.[false];A digital twin is a biometric identification of a real person.[false]

Which statement is true when it comes to Industry 4.0?
What used to happen alone on the assembly line now happens automatically across the entire …

Structure is good but it is generating more than I wanted (> 20 questions)

And it is only working for the topic industy 4.0 which I made examples for in the training data.

Create 5 quiz questions on the topic of hygiene

  • cling film (plastic wrap) Plastic wrap is mainly used in kitchens and is used to protect food from drying out.

When is cling film needed?

  • When storing food

Where is cling film used?

  • In the kitchen

What is cling film used for?

  • To protect against drying out

… and not stopping to generate more…

What am I doing wrong

@ruby_coder Do you have any other ideas what is going wrong with my finetuning?

At one of the legacy forums I used to be busy with back in the “golden days”, I wrote a PHP plugin for technical quizzes with multiple choice answers and managing all user scores, leaderboards, etc. all driven by a MySQL database and a bit of jQuery.

So, if you ask me, honestly, I don’t think you need a generative AI to implement a quiz using either the completions API method or semantic (vector-based) search (embeddings API).

Kindly forgive my candidness.

Then, if you don’t mind me asking, why use OpenAI API technology if you want to create a quiz application which can easily be managed in a traditional SQL database?


Hope this helps.


We are talking about quizzes/questionairs for learning. Here OpenAI can help in 2 areas:

  1. the initial creation of questions, which can take a lot of time (generate, review and correct ist faster than starting at 0)
  2. the authors of the questionnaires often have a creative problem (where should I start,…) and OpenAI can inspire here with suggestions quite well.

In principle, it is about the support in the process of creating questionaires which then end up in the database - and not about always asking OpenAI on demand for new questions for the user.


Generating questions which will populate a DB sounds like a good use case for a generative AI.

Thanks for clarifying!


Great :-). So back to the question to a finetuning pro from a beginner - how can I use finetuning and/or embedding to get better results?

I think you should should focus on a small number of JSONL lines and fine-tune by adjusting your n_epoxhs param so you get the model fitting you want which generates question variations which meets your objective.

Use a validation file to check various variations to help you find the right model fit.