How can I fine tune gpt3.5 to be able to read documentation and also books?

lucas.ofc.caetano · January 17, 2023, 9:53pm

I am looking to fine-tune GPT-3.5 for a specific application involving code generation. However, due to the specific nature of the application, there is a high likelihood of errors in the generated code. To overcome this, I plan on fine-tuning the model using specific documentation and programming books that utilize natural language. I am having difficulty formatting the data in the required format of “prompt” and “completion” pairs. Can you provide guidance on how to properly fine-tune the model using this type of data?

i-technology · January 17, 2023, 10:00pm

Yeah, i too would like to know how to just feed it data, instead of the (question/optimal answer format) …there is no optimal answer if i just want it to read a Wikipedia article

d.c.macbride · April 3, 2023, 5:58pm

Any luck on figuring out how this is done yet?

terrion · April 3, 2023, 10:24pm

What about splitting the process into two calls to the API?

1st divide the data into small fragments that could be added within the messages parameter and make a list of titles for every fragment. Then make a 1st call to the API asking which of the titles fit better to the prompt.

The next step would be to get the number in the response (maybe it’s not only a number, but it should contain only a number) and send only the specific fragment that corresponds to that number as assistant data together with the prompt of the user. The answer should be related to that fragment

For example:

I have a big data of rules from an online game and I wanted ChatGPT to answer only questions about the rules. But they were too many tokens, so I divided them into chapters and made this question to ChatGPT:

Given a list of topics:
1. Register and basic rules
2. Team
3. Players
4. Training
5. Junior School
6. League system
7. National Teams
Tell me the number and only the number of the topic that better fits this prompt: “what should I do to score more goals?”

The answer was: 4

What do you think?

lucas.ofc.caetano · April 4, 2023, 12:35am

One time I sent a message to chatGPT like: how can I fine tune gpt3 using transfer learning, and it gave me the response. But I’m not sure if I can use a website article to fine tune it on by this way

360macky · April 5, 2023, 10:58pm

I would also like to know what is the engineering behind this problem.

andrxduarte · April 7, 2023, 12:45pm

Hello, I managed to use this idea but I did it using a .csv (with data from a microcompany), the idea is the same, I pass the file and it summarizes the file very detailed with the prompt set, however, I want to change the model to gpt3.5-turbo but I’m not getting any success

danielle.eriksson · May 18, 2023, 4:50pm

How did you pass the file to the API?

blanktix · December 7, 2023, 7:58am

Hi, you might have found the solution to the issue when I replied to this, but I would like to reply to this question with my knowledge to help others who have a similar question.

There is a framework called LlamaIndex 🦙 0.9.13. That is a RAG framework which can integrate real data such book and documentation to GPT model.

Topic		Replies	Views
Fine Tuning ChatGPT with large text from Books Prompting	18	11007	March 26, 2024
Fine-Tuning with Non-Prompt/Completion Data: Seeking Advice for Direct Text-Based Training? API gpt-4 , chatgpt , fine-tuning , api	3	315	August 23, 2024
Is it possible to fine-tune a model to answer questions given a raw text? Prompting	18	10094	December 15, 2023
How to Fine-Tune a Model with Book Data for a Chatbot? API fine-tuning	4	504	January 13, 2025
How to fine tune gpt3 on raw text without prompt API	1	663	July 27, 2023

How can I fine tune gpt3.5 to be able to read documentation and also books?

Related topics