By creating a jsonl with hundrets of entries like:
{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is the hallmark of the third industrial revolution?;the invention of the microchip[true];the invention of the telephone[false];the invention of computers[true];the invention of the camera[false]”}
{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is meant by the Internet of Things?;The networking of everyday objects via the Internet.[true];People search for things on the Internet[false];People surf the Internet and call their PC a thing[false]”}
{“prompt”: “Generate a quiz question on the topic of Industry 4.0”, “completion”: “What is the IoT (Internet of Things) compared to the WWW (world wide web)?;IoT is an extension to WWW[true];WWW replaces IoT[false];IoT is the opposite of WWW[false];WWW is a synonym for IoT[false]”}
and creating a finetuned model with it I I thougt I can generate Quiz-Questions in exactly the structure which is the same in all the hundrets of examples.
But what I get by questioning “Generate 1 quiz questions on Industry 4.0” is
What is a blockchain?;A blockchain is a database that is set up on different computers in the form of an account block system.[true];A blockchain is a data set that is protected by an encryption technique.[false];A blockchain is a database that is set up on a blockchain server.[false];A blockchain is a database that is backed up with a blockchain backup.[false]true;false;false;false
What is a blockchain?;These are records that are protected with an encryption technique.[false];A blockchain is a database that is set up on a blockchain server.[false];A blockchain is a database that is stored in the form of an account block system on various servers.
I got two quiz-questions and not 1 as asking for. The structure is often wrong (end of the first question). Sometimes several semicolons are used for delimiting. Sometimes it is missing whether the answer is correct or incorrect [true] or [false] (second question).
Am I comletely wrong with what I am doing?
I thought a fintuned model could at least help me define how responses are formatted.
another result:
Generate 20 quiz questions on Industry 4.0What is the hallmark of the third industrial revolution?;the invention of the computer[true];the invention of electricity[false];the invention of the telephone[false];the invention of the automobile[false]truetruefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsetruefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalse
Has anyone a tip how to create the training data to get better results? Or a working example?
Okay, I have implemented the n_epoches and the structure (ending promt with ++++ and completion with ++++,starting completion with a blank,…) from your tutorial. The result:
Create 3 quiz questions on the topic of Industry 4.0
What is Big Data?
Ideas that can only be generated in large companies.[false];Data that is too large to be read into common analysis tools.[true];Data that has been digitised.[false];Data that is analysed by means of software solutions.[false]
What is a chatbot?
A chatbot is an artificial intelligent being that is able to chat with a human.[true];A chatbot is a computer program that performs various tasks.[false];A chatbot is an AI solution based on cloud computing.[false];A chatbot is a computer-controlled parrot.[false];Data that has been digitised.
What is a digital twin?
A digital twin is a virtual identity of a real person.[false];A digital twin is an application that collects and analyses information in a digital business process chain.[true];A digital twin is a 3D image analysis of a real person.[false];A digital twin is a biometric identification of a real person.[false]
Which statement is true when it comes to Industry 4.0?
What used to happen alone on the assembly line now happens automatically across the entire …
Structure is good but it is generating more than I wanted (> 20 questions)
And it is only working for the topic industy 4.0 which I made examples for in the training data.
Create 5 quiz questions on the topic of hygiene
cling film (plastic wrap) Plastic wrap is mainly used in kitchens and is used to protect food from drying out.
At one of the legacy forums I used to be busy with back in the “golden days”, I wrote a PHP plugin for technical quizzes with multiple choice answers and managing all user scores, leaderboards, etc. all driven by a MySQL database and a bit of jQuery.
So, if you ask me, honestly, I don’t think you need a generative AI to implement a quiz using either the completions API method or semantic (vector-based) search (embeddings API).
Kindly forgive my candidness.
Then, if you don’t mind me asking, why use OpenAI API technology if you want to create a quiz application which can easily be managed in a traditional SQL database?
We are talking about quizzes/questionairs for learning. Here OpenAI can help in 2 areas:
the initial creation of questions, which can take a lot of time (generate, review and correct ist faster than starting at 0)
the authors of the questionnaires often have a creative problem (where should I start,…) and OpenAI can inspire here with suggestions quite well.
In principle, it is about the support in the process of creating questionaires which then end up in the database - and not about always asking OpenAI on demand for new questions for the user.
I think you should should focus on a small number of JSONL lines and fine-tune by adjusting your n_epoxhs param so you get the model fitting you want which generates question variations which meets your objective.
Use a validation file to check various variations to help you find the right model fit.