My Fine-Tuned Model doesn't perfom well anymore


I have fine tuned the gpt-turbo-3.5-1106 model on 21 december last year on my own dataset. It was successfull and I was able to use it.
Last week, when I was asking questions, it did perfom really fine and it did gave great and detailed answers. (as shown below)

Now when I ask the fine tuned model questions, even the same questions, I only get really short and not so detailed answers, despite that my dataset was very much detailed. (answers are below as well)

I don’t know if there is a problem or my fine tuned model did expire, which is unlikely because I used the new model, before gpt-4- turbo was avaiable to use.

Also already tried increasing the max. length. Last week it did answer well with length 500, today even 1000 doesn’t come near the answers of last week.
There is enough credit on my account and my usage limit is enough and high as well.
Maybe the billing is too low? How much should it be? But it should be fine, as asking the model questions is only 0.01 dollar.

Some background information:
I am writing my master thesis on anlayzing economic data with llms.
The dataset I used for training did contain over thousand of questions and answers to the topic, which most of them was generated with gpt as well.

I really need the questions outputs, as they are crucial for my evaluation. And as for now I only have two great answers that are better than chatGPTS answers.

Would be happy, if anyone could help. :slight_smile:

This is the question:
How did the real estate market fare during the COVID-19 pandemic, particularly in terms of property values and sales in the netherland in 2022?

Last weeks reply:
As an AI language model, I don’t have real-time data, so I cannot provide specific information about the real estate market in the Netherlands in 2022. Real estate market conditions are subject to numerous factors that can change over time, including economic conditions, government policies, and public health circumstances.

However, it is worth noting that the Netherlands, like many other countries, experienced some impacts on its real estate market during the COVID-19 pandemic. It is expected that the market will continue to be influenced by various factors in 2022. Here are some general considerations:

  1. Residential market: The residential market in the Netherlands has generally shown resilience during the pandemic. While there may have been fluctuations in property values and sales due to economic uncertainty and changing buyer preferences, the underlying demand for housing remains strong.

  2. Housing sales: The pace of housing sales could be influenced by factors such as mortgage rates, availability of credit, and consumer sentiment. It is important to monitor market conditions and economic indicators to assess the potential impact on sales volumes and transaction activity in 2022.

  3. Rental market: The rental market may continue to be impacted by changes in demand and rental prices. The effects could vary depending on factors like location, demand from expatriates and students, and changes in remote work policies. Monitoring vacancy rates, rental price trends, and local market dynamics will provide a clearer picture of the rental market in 2022.

  4. Commercial real estate: The commercial real estate market, including office spaces, retail properties, and hospitality sectors, could continue to experience challenges as the economy recovers and businesses adapt to new ways of working. Demand for certain types of commercial properties, like industrial and logistics spaces, may remain relatively strong due to the growth of e-commerce.

It’s important to consult local real estate experts, market reports, and up-to-date data to get a more accurate understanding of the specific conditions in the Netherlands real estate market in 2022.

todays reply:
In the Netherlands, residential property values and sales declined less than 2.5 percent.

Try it without setting max_tokens?

Not sure why a fine-tuned model would change…

1 Like

When I use the Playground in OpenAI, I have to give a max length.
Even changed the prompt system on the left side.

Tried it with a Python script as well, but the answer is still short and little detailed.

Hi there - could we take a step back here for a moment. May I ask what you are finetuning for and what your training data pairs consisted of? I might be wrong but it sounds like you are training to inject knowledge into your fine-tuned model? Appreciate if you could clarify as I may be able to give further guidance.

1 Like

Hi there,

I wanted to create a question and answer model with my dataset that consists of more than thousands of Q&A pairs.
Topic was economical impacts during the pandemic, which is why most of the questions are about.

Here are some examples:

{“messages”: [{“role”: “system”, “content”: “You are a very helpful AI assistant. You should help the user to answer their questions.”}, {“role”: “user”, “content”: “3.Question: What is disrupting economic activity in many euro area countries?”}, {“role”: “assistant”, “content”: “3.Answer: The renewed surge in coronavirus (COVID-19) infections and the restrictive and prolonged containment measures imposed are disrupting economic activity in many euro area countries.”}]}

{“messages”: [{“role”: “system”, “content”: “You are a very helpful AI assistant. You should help the user to answer their questions.”}, {“role”: “user”, “content”: “5.Question: What is included in the new fiscal package in Japan?”}, {“role”: “assistant”, “content”: “5.Answer: The new fiscal package in Japan includes about 3.5% of GDP which will support activity in the short term.”}]}

These are the pairs of my jsonl file I fine tuned the model with.

I don’t know if this helps, but last week, when I got the long answers I was in uni, but this week I tried from home. I thought maybe it is different with the internet, but even with a VPN to connect to the uni’s internet, still got the short answers.

Hope it helps.

Thanks for clarifying.

It looks to me like you are trying to teach the model knowledge through your finetuning. This is unfortunately not what finetuning is intended for.

What you can use finetuning for is to influence the style of the model’s output.

Please let me know if my understanding is incorrect.


Thank you for your answer.

This might be correct, but as of my own research it is possible to fine tune a model to act as an question and answer model, e.g. ChatGPT.

Even on the OpenAI site, there it says that the model would perform better on specific topics.

And it worked well last week.

You might want to take a second look here:

Also, you might want to look into assistants where you can do content embedding and retrieval in a Q&A style while also providing instructions to the assistant that can influence the style it uses in its responses.

Or a custom GPT is of course another option.

1 Like

thank you.

but it still doesn’t make sense, that the model was doing fine last week and getting worse this week. I didn’t even used it as much.

could you recommend me some assistants?
or explain custom GPT further?

You should start with the basic documentation on OpenAI. If specific questions on either Assistant or Custom GPT come, check back here in this Forum, ideally with a separate question so you get appropriate attention.

Your model doesn’t get worse from using it. I have many finetuned models - their performance does not change unless you re-train it. Occasionally there may be system-wide degraded performance but other than that, it should work normally.

I suspect that you may got reasonable answers because the baseline model was trained on data that can be leveraged to answer your question. But in general, finetuning - in the context of the OpenAI environment - is not intended for content injection. For that, there are other mechanisms, such as embeddings, RAG etc.

thank you again for your great reply.

But it seems odd, that if my model did create a reasonal answer because of the baseline model, then why won’t do it now asking the same question?

and by chance, do you have any source, that I can use it for my thesis? because every paper I read contained fine tuning for question answering models, even with the same model I used.

The term fine-tuning is often used in a much broader sense and may mean something different than the narrower interpretation of fine-tuning of a GPT baseline model in an OpenAI context.

I would encourage you to go back to the papers of relevance and take a detailed look at their technical approach and understand which techniques they used. If another specific questions comes up, I can try to help. Unfortunately, I am not in a position to assist you with sources for your thesis.

1 Like

thank you very much for your replys and explanations!

I have another question, not really related to this.

On the Playground Sit of OpenAI, where I can ask the chat assistent, is there any possibility to see the past history, even if I deleted it?

I can see the history of the past day, but because I deleted the rest I can’t see the histroy of the past 30 days.
Is there anything where I can retrieve them?