How to fine tune gpt3.5 model with lot of pdfs and document data for domain specific knowledge?

ishani · May 15, 2024, 6:54pm

Hi

I’m trying to figure it out is there any data-tool that I can create datasets for pdfs and docx file for finetuning gpt3.5 model? Do I need to put it in this format only?

{“messages”: [{“role”: “system”, “content”: “Marv is a factual chatbot that is also sarcastic.”}, {“role”: “user”, “content”: “What’s the capital of France?”}, {“role”: “assistant”, “content”: “Paris, as if everyone doesn’t know that already.”}]}

or this format

{“prompt”: “”, “completion”: “”}
{“prompt”: “”, “completion”: “”}
{“prompt”: “”, “completion”: “”}

Any help or advice appreciated. Thank you.

Topic		Replies	Views
Fine-Tuning with Non-Prompt/Completion Data: Seeking Advice for Direct Text-Based Training? API gpt-4 , chatgpt , fine-tuning , api	3	428	August 23, 2024
How can I fine tune gpt3.5 to be able to read documentation and also books? API	8	2440	December 7, 2023
Training with Large PDF FIles API	10	25257	December 15, 2023
How can I add "knowledge" for a specific topic to a finetuned model API	4	1990	December 14, 2023
What is the correct format for dataset content for fine tuning the models (solved) API api	1	747	March 20, 2024

How to fine tune gpt3.5 model with lot of pdfs and document data for domain specific knowledge?

Related topics