I would like OpenAI to generate 10 000 questions from a specific category, let’s say Sport.
The problem is, that model maximum context length is 4097 tokens. So, I could divide my request and ask to generate 100 questions per request only. It is fine, but how can I ensure, that every next batch will have unique questions? Is there any way to tell OpenAI to take into consideration what was created in previous batches?
Any ideas how to achieve it?
Sport is a very broad topic. I’d suggest determining a set of sub categories that relate to the category and generate sets of questions based on those smaller parts. Given the token length I’m not sure there is a simple plug and chug method to remedy your issue. This would likely require more in-depth method than just asking it
Write a set of 100 question answer pairs pertaining to [Insert Sports Team Here]
Write a set of 100 question answer pairs related to [Insert Sport] rules?
Write a set of 100 question answer pairs on [Insert Sports Player] records?
By using a more specific identifier for the context of the questions you are less likely to get repeated questions.
Using more specific identifier is of course good point, but I also wanted this questions to be translated by OpenAI to a few languages at the same time, which still makes context length 4097 tokens a very small output and I will have to maybe lower it down to 10 questions per request.
That’s why I would somehow like the history of my previous replies from AI to be remembered and not included in future completions.
If you are using chatGPT, I tested out a prompt as follows
In this chat I will ask you to create lists of questions relating to Sports. There will be multiple prompts requesting more questions. Please prevent yourself from repeating the same question twice in any of the responses.
Generate a list of Sports related trivia questions covering a wide variety of topics related to the category.
Please generate more trivia questions continuing the list.
and it does seem to generate unique questions every time. Whether or not you can get to 10,000 using that method or something similar I’m not sure, I don’t have time to test it for you.
Perhaps using the Playground with such a prompt could yield decent results, but I think the cost of that would add up quickly.
If you have some programming experience you could format the list of questions into some type of container object and merge any duplicates and see where you’re at from there.
I’m sorry if I can’t provide you with a definitive answer without being aware of your skillset in areas that could assist you.