Fine Tuning ChatGPT with large text from Books

dmerriman09 · March 21, 2023, 5:00pm

I would like to load text from a book to fine-tune my ChatGPT to be more specific and helpful. That’s a lot of text to go sentence for sentence to create prompts and completions. Is there a shortcut?

Manoj_lk · March 21, 2023, 5:03pm

When you say you’d like to fine-tune your ChatGPT, do you mean that you’d like to train the model and ask questions related to the book to get more accurate responses? If so, have you considered exploring the use of embeddings instead?

dmerriman09 · March 21, 2023, 5:59pm

Yes, you are correct. That’s exactly what I mean. Thank you for articulating it for me. To give a more specific use case. I would like to integrate ChatGPT into my chatbot, which focuses on a specific topic of course.

No I have not, I’ll look into use of embeddings. Do you elaborating on why embeddings may be better?

juanluisgarrido · March 21, 2023, 8:40pm

You should try to have something custom made for yourself USING chatgpt api

kevin6 · March 21, 2023, 11:04pm

ChatGPT is a quick and flexible language model. You don’t need to fine-tune or embed it unless you have data that ChatGPT is not familiar with. Those who have more experience with language models know how difficult it was to get them to work, But with ChatGPT, you no longer need a few examples to get it working. Prompting is easily compare to other models.

As far as I know, you can’t fine-tune ChatGPT, but you can fine-tune basic models like Curie and Davinci. If you need to find answers on your own dataset, then you should consider embedding.

I suggest read this document in Openai’s github page : Techniques to improve reliability | OpenAI Cookbook

I also have small small guide on my github page for prompting that might be helpful

adytidmarsh8849 · March 22, 2023, 12:05pm

I am also intersted in training my model with texts. So if I am not getting anything wrong, we still cannot fine-tune the gpt-3.5-turbo, and one possible way to achieve the goal is using document embeddings like Question answering using embeddings-based search | OpenAI Cookbook ?

rik.doclo · March 29, 2023, 11:13am

Can I share my two cents?
The model has been trained on literally everything that is freely available on the internet, and does a great job at one-shot answers. Even a better job at well engineered prompts. Unless the information you want to use for Fine Tuning is not already in the open, I don’t believe it’s worth the effort and better to try and get what you want to achieve by making/engineering better prompts. For what it’s worth. Always open for any suggestions/comments/additions…

juan_olano · March 29, 2023, 11:49am

What is your final purpose? ask open questions about the content of the book? or something else?

I fine tuned davinci with a 70K words book. I split the book in sentences for a total of around 3000 sentences. When tested against arbitrary questions, but all in the context of the book, I didn’t get the expected results as it would diverge to content outside the book.

Then I went the ‘embeddings’ route with the same book. This time I got exactly what I wanted. I could ask the model any question about the book and it would answer perfectly most of the times (95%+ if to put a number).

dmerriman09 · April 10, 2023, 4:47pm

Thank you for your response. I would like to leverage the an OpenAI model to create a chatbot similar to ChatGpt that specializes a specific domain within mental health that is more recent than ChatGpt’s current knowledge base which set in 2021.

Tyler_Durden · April 11, 2023, 6:39am

@ juan_olano
Hey Man, Can you link the code for the “embeddings” method

juan_olano · April 11, 2023, 10:18am

@Tyler_Durden you can download the notebook from HERE . If you have any question don’t hesitate to contact me.

Tyler_Durden · April 14, 2023, 3:55am

@juan_olano thanks man. I will take a look this weekend!

NazAhmed · April 17, 2023, 10:27pm

I’ve built this exact functionality using embeddings to search for the right passages from books (with HyDE) and then davinci to generate the output.

See my post on twitter for more info. Happy to help if you want to build something similar: https://twitter.com/naz_io/status/1647990346024988673

luispino · April 17, 2023, 10:46pm

Hello, I’m medical oncologist in process of training the model for a very specific topic (non-small cell lung cancer egfr mutated), but I need to include 2022 and 2023 Q1 pivotal articles. Some recommendation?

ilsaimon · April 18, 2023, 5:22am

If you’re looking into a place where to chat with documents, we’ve built Sharly - we tested on Harry Potter’s books and it works pretty well.

On your request about prompt - are you looking into creating and saving yours or how?

andrewzheng1618 · May 20, 2023, 2:35am

If I already have a large labeled QA dataset, how to leverage it to further enhance the performance based on Question answering using embeddings-based search?

andrewjaycooke · September 30, 2023, 1:29am

Hey guys,
Very tech savy and enjoying chatgpt but really green to any kind of programming. I enjoy using chatgpt as a book editor but I want it to make more helpful suggestions in the genre tone I’m writing in. Is there anyway to feed it some sample pages of the tone i’m going for to better curate it’s tone? Does that make sense?

GPT4MORE · November 13, 2023, 1:48pm

Is there an update on this? OpenAI now allows fine tuning models. Would it be possible to have GPT create a JSONL from a book and then load it into a fine tuned model?

Topic		Replies	Views
Are fine-tuned models a good way to give GPT a specific tone of voice? API api	5	3863	July 20, 2023
How to fine tune gpt3 on raw text without prompt API	1	680	July 27, 2023
How can I fine tune gpt3.5 to be able to read documentation and also books? API	8	2421	December 7, 2023
How to Fine-Tune a Model with Book Data for a Chatbot? API fine-tuning	4	664	January 13, 2025
ChatGPT fine-tuning as a service API	17	13763	December 13, 2023

Fine Tuning ChatGPT with large text from Books

Related topics