Fine-tuning jsonl file format question

Need a little help with trying to figure out how to use the fine-tuning models using davinci-002.
Went thru a bit of a beating today b/c I was trying to figure out the format for the fine-tune jsonl file.
I keep getting the error:

“13:45:16 The job failed due to an invalid training file. Invalid file format. Example 1 prompt contains invalid tokens.
13:45:06 Validating training file: file-lf56CBE7x8G3x1baawuk7J3Q
13:45:05 Created fine-tuning job: ftjob-3gJR1kyIqDUNzhtuaJuhbetx”
this is my fine-tune file

{"prompt": "<|startoftext|>", "completion": "As a fan of TCU, I have been mesmerized by the change of offensive play since Sonny Cumbie became the offensive coordinator in 2014. I knew that Cumbie played under Mike Leach at Texas Tech but I never understood Leach's system. I considered Leach to be an eccentric coach running a version of the Run and Shoot offense. This book allowed me to dive deep into the genius and origin of Leach's offense. Gwynne does a masterful job at telling the story of the forward pass and the godfather of the modern spread offense. I didn't know Hal Mumme before I started the book and now I want to watch every game he coached. I recommend this book to any fan of American Football. This is the top book I have read about the sport.<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "The amount of research that went into this book is phenomenal. Bilton built a case that will leave no doubt about the true identity of Dread Pirate Roberts. I used to have sympathy for Ross Ulbricht and had some doubts if he was capable of building The Silk Road. This book changed all of my preconceived thoughts.<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "If the point of this book was to piss you off about the lifestyle of the rich kids of LA in the 80's, then this book deserves five stars. It is the story of a bunch of whinging strung out psychos whose parents are permissive about everything. I hated every character and wanted them all to end up like Julian. A massive waste of time for me.<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "Duhigg does insightful analysis on the traits of a habit. He explores three areas of habits: individuals, organizations, and societies. This is not a good book on how to form a habit. It is a series of case studies that look at successful and self-destructing habits in individuals, organizations and societies.<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "This is the best time killer of all time. It is a hilarious book of pictures of the great leader looking at things. This book is the best Christmas present you will ever give to a loved one. Some facts about Kim Jong Il you might not have known: Kim Jong-Il was a fashion icon. Kim Jong-Il Invented The Hamburger. Kim Jong-il was a master golfer - he had 11 hole in ones in HIS FIRST ROUND.<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "I just looked up this book to find the short story about the elf on the shelf. I was laughing for days reading about this poor abused elf sent to Tampa Bay. The altercation with Rudolph will have you in tears. Here is the link to the story: https://kielarowski.net/2013/12/20/my-career-as-an-elf-on-the-shelf/<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "Must read for citizens of the Metroplex. I could not put this book down. I felt proud, embarrassed and disgusted to be a fan of the Cowboys of the 90's.<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "What happens to your personality when you get behind the steering wheel of a car? This book explains the emotional state of humans while driving. It's probably a fun book to play on audible while driving across the country. I decided to read this book after listening to a podcast by Malcolm Gladwell called Blame Game. http://revisionisthistory.com/episodes/08-blame-game<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "A look at the early days of Nike and the struggle that it took to become cash flow positive. The reader has an intimate view of the compromises facing Knight during Nike's rise to the top. This is a must read for anyone that ran track and field in high school or runs for recreation.<|endoftext|>"}
{"prompt": "<|startoftext|>", "completion": "A must read for anyone interested in the excess and greed of the Oklahoma Oilies of the early 80's.<|endoftext|>"}

and this is my validation file

{"prompt": "This book was an amazing journey from start to finish. The characters were deeply developed and the plot was thrilling.\n\n###\n\n", "completion": "I'm glad you enjoyed the book! It sounds like it was a fantastic read."}
{"prompt": "The book 'To Kill a Mockingbird' by Harper Lee is a compelling tale of racial injustice and moral growth. The characters are vivid, and the narrative is powerful, making it a timeless classic.\n\n###\n\n", "completion": "Thank you for the review! 'To Kill a Mockingbird' is indeed a classic that offers deep insights into societal issues."}

Besides the double line break at the end of prompts in the validating fine I don’t see (without my IDE) other issues in JSON (don’t completion models complain when the prompt ends with empty characters, they used to when I last used them).

But what I notice is that your training file has the start of text markup (and no prompt) while validation file prompt format is a totally different thing… So my questions are:

  1. What are you trying to archive?
  2. Why do you think it will work?
  3. Why your training is in this form?

Thanks for the response!

My goal is to build a model that will write book reviews for me.

This problem started when I noticed that I wasn’t able to sync up my audible books with my goodreads account.

It wasn’t a big deal until I noticed that all of the notes/clips could be exported on goodreads privacy download. I have been looking for a way to extract my notes for a long time.

So I found the book review import tool and I have the ability to upload reviews.

I am not a big book reviewer but I have about ALOT books that I could submit reviews for b/c I listen to all books on audible.

I wrote a quick post on how to extract your audible books out of Amazon.
(Ask me for link b/c i can’t use links on this reply)

I thought about using the Goodreads public API but I found out that they stopped offering the service in 2020.

I went to chatgpt with the question and I went down the rabbit hole for the rest of the afternoon.
how do i build a custom GPT that will write book reviews for me in my voice. I would like to upload some previous reviews for the GPT to know what my voice sounds like

ChatGPT Step-by-Step Guide to Fine-Tuning with OpenAI Custom GPTs

  1. Sign Up for OpenAI API Access:

    • Ensure you have access to OpenAI’s API. You may need to sign up for an API key if you don’t already have one.
  2. Prepare Your Data:

    • Collect your book reviews and format them appropriately. OpenAI expects data in a specific JSONL (JSON Lines) format where each line is a separate JSON object.
    • Example format for your reviews:
      {
        "prompt": "<|startoftext|>",
        "completion": "Your review text here<|endoftext|>"
      }
      
  3. Set Up Your Environment:

    • You can use tools like openai Python package to manage the fine-tuning process. Install the package if you haven’t already:
      pip install openai
      
  4. Upload Your Data:

    • Use the OpenAI API to upload your data for fine-tuning.

    • Example using Python:

      import openai
      openai.api_key = 'your-api-key'
      
      response = openai.File.create(
        file=open("path_to_your_file.jsonl"),
        purpose='fine-tune'
      )
      
  5. Fine-Tune the Model:

    • Once your data is uploaded, you can start the fine-tuning job.
    • Example:
      response = openai.FineTune.create(
        training_file="file-id-received-after-upload",
        model="davinci"  # or another suitable model like `curie`, `babbage`, or `ada`
      )
      
  6. Monitor the Fine-Tuning Process:

    • You can check the status of your fine-tuning job:
      response = openai.FineTune.list()
      
  7. Use Your Fine-Tuned Model:

    • Once the fine-tuning is complete, you can use the model for generating book reviews in your voice.
    • Example:
      response = openai.Completion.create(
        model="fine-tuned-model-name",
        prompt="This book was an amazing journey from start to finish. The characters",
        max_tokens=200
      )
      print(response.choices[0].text)
      

Good part is that I figured out alot with assistants and that is probably the best way to go. However since I spent so much time on the task today I would like to figure out what I was doing wrong.

1 Like

Very interesting… Your post is one of those rate and hidden gems where you don’t even what to look closer in a message because of how “off” the recommended use it is. And then, you realize it’s almost a ready to build business idea.

All is good, except of the tech approach.

Try to start by writing down the workflow of how you write a book review and note for each of the steps what info/details/input you need to complete the step.

Then you would see how to use AI to automate those steps and organize your notes in full reviews. If you wish, send them to me here or in PM and I’ll help you do the tech part.

:100: agree!!!
Feel like an idiot for not planning it out.
I was so frustrated with Goodreads that I found a great alternative called Hardcover. It blows goodreads out of the water.

1 Like