Hi
I’m trying to figure it out is there any data-tool that I can create datasets for pdfs and docx file for finetuning gpt3.5 model? Do I need to put it in this format only?
{“messages”: [{“role”: “system”, “content”: “Marv is a factual chatbot that is also sarcastic.”}, {“role”: “user”, “content”: “What’s the capital of France?”}, {“role”: “assistant”, “content”: “Paris, as if everyone doesn’t know that already.”}]}
or this format
{“prompt”: “”, “completion”: “”}
{“prompt”: “”, “completion”: “”}
{“prompt”: “”, “completion”: “”}
Any help or advice appreciated. Thank you.