I am trying to use the Open AI API to create quiz questions with three incorrect answers and one correct answer. The prompt I use is:
Write a quiz on ${quiz_topic} with 5 questions. Each question has 3 incorrect answers and 1 correct answer. The correct answer is always the last answer. Write each answer on a separate line
I would like to parse GPT-3’s answer but the format is inconsistent. For instance, if quiz_topic is “literature”, I get the following response:
Q1. Who wrote the novel "The Catcher in the Rye"?
A. J.D. Salinger
B. Ernest Hemingway
C. F. Scott Fitzgerald
D. Mark Twain
This is the correct format. But if I use “U.S History”, I get the following output:
Q1: Who was the first president of the United States?
A1: George Washington Jr., Thomas Jefferson, John Adams, George Washington
This is a different format. So my goal is to fine-tune it using the article here
I have created 200 line of a training set. Here is a sample:
{"prompt":"Write a quiz on general knowledge with 5 questions. Each question has 3 incorrect answers and 1 correct answer. The correct answer is always the last answer.Write each answer on a separate line \n\n---\n\n","completion":" Q. What is the Italian word for "tomato"?\nA. Aglio\nA. Cipolla\nA. Peperoncino\nA. Pomodoro\n\nQ. Which restaurant's mascot is a clown?\nA. Whataburger\nA. Burger King\nA. Sonic\nA. McDonald's\n\nQ. Which of these colours is NOT featured in the logo for Google?\nA. Yellow\nA. Blue\nA. Green\nA. Pink\n\nQ. In 2013 how much money was lost by Nigerian scams?\nA. $95 Million\nA. $956 Million\nA. $2.7 Billion\nA. $12.7 Billion\n\nQ. What is the name of Poland in Polish?\nA. Pupcia\nA. Polszka\nA. Póland\nA. Polska\n\n \n\n###\n\n"}
{"prompt":"Write a quiz on books with 5 questions. Each question has 3 incorrect answers and 1 correct answer. The correct answer is always the last answer.Write each answer on a separate line \n\n---\n\n","completion":" Q. What is the name of Eragon's dragon in "Eragon"?\nA. Glaedr\nA. Thorn\nA. Arya\nA. Saphira\n\nQ. In the "The Hobbit", who kills Smaug?\nA. Bilbo Baggins\nA. Gandalf the Grey\nA. Frodo\nA. Bard\n\nQ. What is Hermione Granger's middle name?\nA. Jane\nA. Emma\nA. Jo\nA. Jean\n\nQ. According to The Hitchhiker's Guide to the Galaxy book, the answer to life, the universe and everything else is...\nA. Loving everyone around you\nA. Chocolate\nA. Death\nA. 42\n\nQ. What is the name of the three headed dog in Harry Potter and the Sorcerer's Stone?\nA. Spike\nA. Poofy\nA. Spot\nA. Fluffy\n\n \n\n###\n\n"}
When I run the validation tool with the command
openai tools fine_tunes.prepare_data -f training.jsonl
I get the following message
- All prompts start with prefix `Write a quiz on `. Fine-tuning doesn't require the instruction specifying the task, or a few-shot example scenario. Most of the time you should only add the input data into the prompt, and the desired output into the completion
I don’t understand why I must remove “Write a quiz on”. So I seem to have misunderstood how to fine-tune a model for consistent formatting. Can anybody shed a light on how to fine-tune it to make sure I get the same formatting with the same prompt?