Hello everyone,
I’m currently working on a project where I have a large collection of articles, documents, and general information. These texts are not in the typical prompt-completion format required for fine-tuning with the OpenAI API. Instead, they are structured as continuous prose, with valuable information scattered throughout.
Given this, I’m looking for advice on how I can use this kind of non-prompt/completion data for fine-tuning a model. Specifically:
- Is there a way to fine-tune directly with such text data without converting it into prompt-completion pairs?
- Can the OpenAI API support this, or would I need to look into alternative methods or platforms (like Hugging Face)?
- Has anyone successfully used non-prompt/completion data for fine-tuning with OpenAI’s API? If so, how did you approach it?
- Any recommendations or best practices for efficiently converting large text data into the required format, if that’s the only option?
I appreciate any insights or experiences you can share. I’m particularly interested in ways to minimize the manual effort in formatting the data while still achieving effective fine-tuning results.
Thank you in advance for your help!
Best regards,