Custom model response not aligning with training datasets

bibek1 · January 23, 2025, 11:00pm

Hi everyone, I’ve been working on fine-tuning a GPT model (using gpt-4o-2024-08-06) with company-specific data stored in a blog-like format in our database. For the fine-tuning process, I used 5 blog posts from our database and passed their titles and bodies as input for training.

The fine-tuning process completed successfully, and I’ve started using the model for prompting. However, when I ask questions, the responses often don’t align with the data I provided during fine-tuning. It seems like the model is not accurately reflecting the content of my dataset.

Could the issue be related to the limited dataset size, or might there be other factors at play? What can I do to ensure the model gives responses based on the data I supplied during fine-tuning? Any tips or insights would be greatly appreciated!

Diet · January 23, 2025, 11:19pm

Welcome to the community!

Did you expect that you could provide data during fine-tuning, and that this data would then be useable at inference time?

Unfortunately, that only works with a tiny, very specific scope - but even then I wouldn’t recommend it.

If you want data you provide to be used to answer questions, you need to somehow include that in the context. One way people do that is with RAG (retrieval augmented generation), and one simple implementation of this is to use the assistants API with a document.

Topic		Replies	Views
Fine-Tuned Model Not Responding with Expected Answers API	2	353	November 6, 2024
Fine-Tuning with Custom Data Community fine-tuning	1	1398	February 1, 2024
I am getting non related answers from my fine-tuned model! Community fine-tuning-problems	1	60	November 6, 2024
Seeking Advice on Fine-Tuning NLP Model for Response Generation Community gpt-35-turbo	3	143	February 24, 2025
Fine tuned model produces responses that make it seem like it hasn't been fine tuned at all API fine-tuning , fine-tuning-problems	1	1570	September 14, 2023

Custom model response not aligning with training datasets

Related topics