Fine tuning GPT-4o with large data source in system prompt

jithinjagadeesh1 · January 8, 2025, 6:20am

Hi everyone,

I am currently trying to fine tune gpt-4o/gpt-4o-mini. I have a large data source which I am currently passing along with the system prompt for the model to provide me with a structured response in JSON format. The values to be inserted in the JSON response is taken from the data source which I am currently passing with the system prompt.

However I want to improve the accuracy on how the model builds the JSON response by fine tuning using a dataset of user prompts and the type of response expected from the model. My doubt is whether I should include the huge data source inside the system prompt while building the fine tuning dataset. The data source takes up about 5000 tokens and including them in the system prompt will be very expensive.

If I omit the data source in finetuning dataset and just include it in the API calls of the fine tuned model, will it generate the response using the mix of data source and fine tuned knowledge?

Thanks!.

Topic		Replies	Views
Saving tokens on system prompts API fine-tuning	3	2138	October 10, 2023
Fine-Tuning with Non-Prompt/Completion Data: Seeking Advice for Direct Text-Based Training? API gpt-4 , chatgpt , fine-tuning , api	3	455	August 23, 2024
Fine tuning GPT 4 (max tokens, system prompt, guide) API gpt-4 , fine-tuning , api	1	2954	April 19, 2024
Fine Tunning gpt4o model with Prompt Engineering and RAG Prompting gpt-4 , api	0	222	September 3, 2024
Struggling with fine-tuning GPT for generating JSON API fine-tuning , fine-tuning-problems	1	368	July 9, 2024

Fine tuning GPT-4o with large data source in system prompt

Related topics