Fine-tuning model with JSON-schema

kasra.mhdz · September 23, 2024, 6:43pm

Hey everyone,

I’m planning to extract structured data from a collection of free-text records—nearly 50,000 entries, each averaging around 200 tokens. I’ve created a comprehensive JSON schema with about 1,300 lines, and tested it in the Assistant mode within the playground environment. However, I’ve noticed that the GPT-4o-mini by itself doesn’t provide the level of accuracy and consistency I require and seemingly requires some finetuning.

So I prepared 100 examples according to the JSON-schema to fine-tune the model with, but I don’t know how to incorporate the JSON schema in the JSONL structure for both the training phase and the actual data extraction calls.
In the OpenAI cookbook, section “Introduction to Structured Outputs” the schema is included the in every single example. This, besides being redundant, would skyrocket the cost of training and running the model.

Any suggestions to tackle this problem efficiently?

Thanks in advance

Topic		Replies	Views
Fine tuning GPT-4o with large data source in system prompt API gpt-4 , fine-tuning , api , data-preparation , api-structured-data	0	132	January 8, 2025
Struggling with fine-tuning GPT for generating JSON API fine-tuning , fine-tuning-problems	1	365	July 9, 2024
How to validate fine-tune file using JSON Schema validator Community gpt-4	0	252	August 7, 2024
Finetune for langchain usage API fine-tuning , langchain , structured-output	0	115	November 12, 2024
Finetuning structured outputs API	3	1795	September 10, 2024

Fine-tuning model with JSON-schema

Related topics