Fine Tuning text-davinci-003 Model for Quizzes

christian.kiefer · February 20, 2023, 4:12pm

By creating a jsonl with hundrets of entries like:

{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is the hallmark of the third industrial revolution?;the invention of the microchip[true];the invention of the telephone[false];the invention of computers[true];the invention of the camera[false]”}
{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is meant by the Internet of Things?;The networking of everyday objects via the Internet.[true];People search for things on the Internet[false];People surf the Internet and call their PC a thing[false]”}
{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is the IoT (Internet of Things) compared to the WWW (world wide web)?;IoT is an extension to WWW[true];WWW replaces IoT[false];IoT is the opposite of WWW[false];WWW is a synonym for IoT[false]”}

and creating a finetuned model with it I I thougt I can generate Quiz-Questions in exactly the structure which is the same in all the hundrets of examples.

But what I get by questioning “Generate 1 quiz questions on Industry 4.0” is

What is a blockchain?;A blockchain is a database that is set up on different computers in the form of an account block system.[true];A blockchain is a data set that is protected by an encryption technique.[false];A blockchain is a database that is set up on a blockchain server.[false];A blockchain is a database that is backed up with a blockchain backup.[false]true;false;false;false

What is a blockchain?;These are records that are protected with an encryption technique.[false];A blockchain is a database that is set up on a blockchain server.[false];A blockchain is a database that is stored in the form of an account block system on various servers.

I got two quiz-questions and not 1 as asking for. The structure is often wrong (end of the first question). Sometimes several semicolons are used for delimiting. Sometimes it is missing whether the answer is correct or incorrect [true] or [false] (second question).

Am I comletely wrong with what I am doing?
I thought a fintuned model could at least help me define how responses are formatted.

another result:

Generate 20 quiz questions on Industry 4.0What is the hallmark of the third industrial revolution?;the invention of the computer[true];the invention of electricity[false];the invention of the telephone[false];the invention of the automobile[false]truetruefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsetruefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalse

Has anyone a tip how to create the training data to get better results? Or a working example?

ruby_coder · February 20, 2023, 4:38pm

Sure!

Take a look at this lab-based tutorial:

christian.kiefer · February 23, 2023, 5:23pm

Okay, I have implemented the n_epoches and the structure (ending promt with ++++ and completion with ++++,starting completion with a blank,…) from your tutorial. The result:

Create 3 quiz questions on the topic of Industry 4.0

What is Big Data?

Ideas that can only be generated in large companies.[false];Data that is too large to be read into common analysis tools.[true];Data that has been digitised.[false];Data that is analysed by means of software solutions.[false]

What is a chatbot?

A chatbot is an artificial intelligent being that is able to chat with a human.[true];A chatbot is a computer program that performs various tasks.[false];A chatbot is an AI solution based on cloud computing.[false];A chatbot is a computer-controlled parrot.[false];Data that has been digitised.

What is a digital twin?

A digital twin is a virtual identity of a real person.[false];A digital twin is an application that collects and analyses information in a digital business process chain.[true];A digital twin is a 3D image analysis of a real person.[false];A digital twin is a biometric identification of a real person.[false]

Which statement is true when it comes to Industry 4.0?
What used to happen alone on the assembly line now happens automatically across the entire …

Structure is good but it is generating more than I wanted (> 20 questions)

And it is only working for the topic industy 4.0 which I made examples for in the training data.

Create 5 quiz questions on the topic of hygiene

cling film (plastic wrap) Plastic wrap is mainly used in kitchens and is used to protect food from drying out.

When is cling film needed?

When storing food

Where is cling film used?

In the kitchen

What is cling film used for?

To protect against drying out

… and not stopping to generate more…

What am I doing wrong

christian.kiefer · February 27, 2023, 11:29am

@ruby_coder Do you have any other ideas what is going wrong with my finetuning?

ruby_coder · February 27, 2023, 11:41am

At one of the legacy forums I used to be busy with back in the “golden days”, I wrote a PHP plugin for technical quizzes with multiple choice answers and managing all user scores, leaderboards, etc. all driven by a MySQL database and a bit of jQuery.

So, if you ask me, honestly, I don’t think you need a generative AI to implement a quiz using either the completions API method or semantic (vector-based) search (embeddings API).

Kindly forgive my candidness.

Then, if you don’t mind me asking, why use OpenAI API technology if you want to create a quiz application which can easily be managed in a traditional SQL database?

Hope this helps.

christian.kiefer · February 27, 2023, 1:27pm

We are talking about quizzes/questionairs for learning. Here OpenAI can help in 2 areas:

the initial creation of questions, which can take a lot of time (generate, review and correct ist faster than starting at 0)
the authors of the questionnaires often have a creative problem (where should I start,…) and OpenAI can inspire here with suggestions quite well.

In principle, it is about the support in the process of creating questionaires which then end up in the database - and not about always asking OpenAI on demand for new questions for the user.

ruby_coder · February 27, 2023, 1:31pm

Generating questions which will populate a DB sounds like a good use case for a generative AI.

Thanks for clarifying!

christian.kiefer · February 27, 2023, 1:49pm

Great :-). So back to the question to a finetuning pro from a beginner - how can I use finetuning and/or embedding to get better results?

ruby_coder · February 27, 2023, 2:02pm

I think you should should focus on a small number of JSONL lines and fine-tune by adjusting your n_epoxhs param so you get the model fitting you want which generates question variations which meets your objective.

Use a validation file to check various variations to help you find the right model fit.

HTH

Topic		Replies	Views
Got awful results after fine-tuning API	11	3213	December 1, 2022
Struggling with poor performance on fine-tuned davinci model API	15	2677	December 20, 2023
Fine tuning using a corpus API api	8	2086	July 13, 2023
Fine-tuning problem API	4	2074	December 19, 2022
Fine-tuned davinci - prompt/completion - terrible responses Prompting	8	2556	December 24, 2023

Fine Tuning text-davinci-003 Model for Quizzes

Related topics