Struggling with fine-tuning GPT for generating JSON

Hey everyone. I’ve been spending a few weeks trying to get GPT to work for my use case: turning a natural language search query into a set of complex JSON search filters. The filter format is extremely specific and there are numerous edge cases that require special handling. Prompt engineering with 4o gets a fairly decent working prototype but each query is ~8000 tokens long (primarily due to the extensive system prompt), which is unsustainable from a cost perspective.

After reading the docs, it seemed like fine tuning would be the ideal approach to get consistent results, reduce latency, and cut down on prompt length, saving costs. To see if fine tuning would work, I generated ~50 examples with a validation set of another 10 examples, as per the docs. The data used to train/validate the model was a mapping of search query to search filters. The training data was programmatically generated through a Selenium script that would emulate a user clicking on the search filters and the validation data was created manually.

Unfortunately, the fine-tuned model did not perform well at all and would not conform to the strict structure the search filters require. Looking at the model metrics graph, I noticed that I seemed to be missing the validation curve. Analyzing the raw metrics, I only had individual data points for the validation loss to compare to the training loss, so it’s difficult for me to determine what went wrong.

I’m wondering if anyone has any ideas on how I can understand what went wrong and how I can improve my fine tuned model on future iterations. Also, just this small dataset was already fairly costly since I included the system prompt (again from the docs) and so if the problem is a lack of data, are there any other ways I can make this more cost-effective?

1 Like

Welcome to the community!

It sounds like the queries you’re generating might depend on contextual information. Fine-tuning isn’t great at getting the model to retain specific information.

Have you considered perhaps dynamically loading schema information into the prompt based on the search query?

using embeddings, or customizing embeddings might be a good option here. here’s an example: Customizing embeddings | OpenAI Cookbook

but raw embeddings might work too, I’d try that first.