Prompt and completion generation from text

Is there any helpful tools to generate prompt and completions for fine-tuning from large amount of text?
Say that you have a white papper on 100 pages that you want to include in your fine tune model.

No dataset cleaning tools that I know about off-hand.

Every time I’ve fine-tuned, I’ve ended up having to “clean” or “prepare” the data manually.

A smart tool to do it would be very cool.


I found this service:

For prompt generation: TrainMyAI - Question and Answer Generator for GPT-3 Fine Tuning

You can download the source-code for it and run it locally

This tool is awesome! Do you know where can the source code be found, as I would like to run it locally on some sensitive data. Thanks for sharing!

Here is the link to download the code:

