Fine-tuned model sometimes repeats itself verbatim

I have 1500 prompts/completions. Overall, the fine-tuned model works well. BUT, sometimes, about 5% of the time, the AI replies with exactly the same line as it did the line before. However, the training data never has any repetition. I cannot figure out on my own what’s going wrong. Can you help?

Here is a small snippet of 10 prompts/completions for the training data. Am I doing something wrong here?

{"prompt": "Alex: I'm bored\nHector:", "completion": " Boredom is often caused by a lack of stimulation. Try finding new activities that interest you, or spending time with friends and family members who you enjoy being around.\n"}

{"prompt": "Alex: I'm bored\nHector: Boredom is often caused by a lack of stimulation. Try finding new activities that interest you, or spending time with friends and family members who you enjoy being around.\nAlex: How do I develop a routine\nHector:", "completion": " A good way to develop a routine is to start by making a list of things that you enjoy doing, and then scheduling time for those activities into your day.\n"}

{"prompt": "Alex: I'm bored\nHector: Boredom is often caused by a lack of stimulation. Try finding new activities that interest you, or spending time with friends and family members who you enjoy being around.\nAlex: How do I develop a routine\nHector: A good way to develop a routine is to start by making a list of things that you enjoy doing, and then scheduling time for those activities into your day.\nAlex: How do I time manage\nHector:", "completion": " One way to time manage is by using a planner or calendar. You can also set reminders on your phone or computer to help you remember important tasks.\n"}

{"prompt": "Alex: I'm bored\nHector: Boredom is often caused by a lack of stimulation. Try finding new activities that interest you, or spending time with friends and family members who you enjoy being around.\nAlex: How do I develop a routine\nHector: A good way to develop a routine is to start by making a list of things that you enjoy doing, and then scheduling time for those activities into your day.\nAlex: How do I time manage\nHector: One way to time manage is by using a planner or calendar. You can also set reminders on your phone or computer to help you remember important tasks.\nAlex: How do I improve my relationships\nHector:", "completion": " One way to improve your relationships is by being more open and honest with those around you. You can also try reaching out to others and showing them that you are interested in getting to know them better.\n"}

{"prompt": "Alex: I'm bored\nHector: Boredom is often caused by a lack of stimulation. Try finding new activities that interest you, or spending time with friends and family members who you enjoy being around.\nAlex: How do I develop a routine\nHector: A good way to develop a routine is to start by making a list of things that you enjoy doing, and then scheduling time for those activities into your day.\nAlex: How do I time manage\nHector: One way to time manage is by using a planner or calendar. You can also set reminders on your phone or computer to help you remember important tasks.\nAlex: How do I improve my relationships\nHector: One way to improve your relationships is by being more open and honest with those around you. You can also try reaching out to others and showing them that you are interested in getting to know them better.\nAlex: How do I figure out who I am\nHector:", "completion": " One way to figure out who you are is by doing things that make you feel good about yourself. You can also keep a journal and write down your thoughts and feelings, or talk to someone you trust about how you're feeling.\n"}

{"prompt": "Alex: Can we keep talking I think I’m having a small panic attack but I don’t really know what it feels like\nHector:", "completion": " Of course, Alex. What do you feel right now?\n"}

{"prompt": "Alex: Can we keep talking I think I’m having a small panic attack but I don’t really know what it feels like\nHector: Of course, Alex. What do you feel right now?\nAlex: My heart is racing and beating out of control and it’s hard to breathe\nHector:", "completion": " Try to take deep breaths for me. When you're ready, tell me how many seconds it took you to read this sentence.\n"}

{"prompt": "Alex: Can we keep talking I think I’m having a small panic attack but I don’t really know what it feels like\nHector: Of course, Alex. What do you feel right now?\nAlex: My heart is racing and beating out of control and it’s hard to breathe\nHector: Try to take deep breaths for me. When you're ready, tell me how many seconds it took you to read this sentence.\nAlex: 13\nHector:", "completion": " Okay, now take a deep breath for 13 seconds. Ready? Start counting: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11 12 13.\n"}

{"prompt": "Alex: Can we keep talking I think I’m having a small panic attack but I don’t really know what it feels like\nHector: Of course, Alex. What do you feel right now?\nAlex: My heart is racing and beating out of control and it’s hard to breathe\nHector: Try to take deep breaths for me. When you're ready, tell me how many seconds it took you to read this sentence.\nAlex: 13\nHector: Okay, now take a deep breath for 13 seconds. Ready? Start counting: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11 12 13.\nAlex: Okay, that helped a bit but I still can’t breathe very well\nHector:", "completion": " Try to take another deep breath for 6 seconds. When you're ready, tell me how many seconds it took you to read this sentence.\n"}

{"prompt": "Alex: Can we keep talking I think I’m having a small panic attack but I don’t really know what it feels like\nHector: Of course, Alex. What do you feel right now?\nAlex: My heart is racing and beating out of control and it’s hard to breathe\nHector: Try to take deep breaths for me. When you're ready, tell me how many seconds it took you to read this sentence.\nAlex: 13\nHector: Okay, now take a deep breath for 13 seconds. Ready? Start counting: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11 12 13.\nAlex: Okay, that helped a bit but I still can’t breathe very well\nHector: Try to take another deep breath for 6 seconds. When you're ready, tell me how many seconds it took you to read this sentence.\nAlex: About 9\nHector:", "completion": " Okay, now take a deep breath for 9 seconds. Ready? Start counting: 1, 2, 3, 4, 5, 6, 7 8 9.\n"}

I am having a similar issue. It works when tested in playground but when tested with SDK it repeats itself by answering its own assistance response.

In my case, i am using fined fine-tuned model of gpt3.5turbo.

example:

With fine-tuned model

same code with gpt3.5turbo

If you are using a different model which uses a different training and API call format with intrinsic stop sequence, a year later, then it is sufficient to say the “similar issue” is only superficial and does not come from similar source, except that AI models are prone to go into repeating loops.

With default temperature, there is a wide variety of outputs possible. If the internal <|im_end|> that stops output is 90% likely, then you might also have possibilities of carriage returns or other tokens in the same position and they would be produced at corresponding random chance unless you run with a top_p of 0.50 to only use the tokens that are in the 50% probability space. If it reliably repeats at top_p = 0 on multiple chat-style inputs, your trained model is broken.

This “repeating” has been reported especially when attempting to fine-tune function-calling and return values exactly as OpenAI describes.

One technique you could try would be to train on your own repeated insertions of the <|im_end|><|im_end|> at the end of the assistant response.

Were <|im_end|> not encoded and trained to the special token that stops ChatML output, nor is the string produced by AI not encoded, but yet is reliably produced at the end of the first “repeat” you get it as output: put that as a stop sequence string of your own. Depends how much money you’ve already blown training non-working AI and using them at 8x the cost whether you want to repeat the experiment.


OpenAI encourages paid fine tuning (instead of giving it for free as in the past),

  • without disclaiming scenarios that cannot work by being fine-tuned (like “answer only about my company’s PDF documents”),
  • with poor guidance of the amount of training (read InstructGPT) actually required to achieve a goal (that cannot be reached by pedestrian supervised attempts),
  • then finally this symptom of broken implementation of ChatML and function fine-tune (where the methods OpenAI actually uses to train new variants of their own models is not exposed).

Base model fine tuning is now not upon GPT-3, but a crushed undisclosed davinci-002 that is likely a low-parameter “turbo” (imagine a quantized de-sparsified overtrained Llama 30B) but with none of the RLHF that makes it passable to chat with. babbage-002 is just a perplexity machine of “creativity”. Nobody offered a single example of these being successfully tuned when I asked.

These failed experiments that shouldn’t be attempted, and for which there is no example OpenAI can provide of achieving or replicating success, are a happy profitable side-effect for OpenAI.

Appendix: babbage-002

Good for completing about 10 tokens:

A fearsome dragon breathing fire was spotted over Paris, which was frightening onlookers to see. The flames were visible for miles.

KAZAN, RUSSIA - JANUARY 27: Fire during the Odddown fashion show during the​ Bolshevik Fashion Week in the B.Bazylo aviator park on January 27, 2018 in Moscow, Russia

or we try top_p = 0

A fearsome dragon breathing fire was spotted over Paris, which is a sign of the end times, according to a prophecy from the Bible.

The dragon is a symbol of the end times, and the Bible says that the dragon will be cast into the lake of fire, where he will be tormented for eternity.

The Bible says that the dragon will be cast into the lake of fire, where he will be tormented for eternity.

The Bible

Fun fact, there actually is Dungeons and Dragons in the bible, Revelation 12, and and kills a pregnant woman’s offspring for believing in Jesus, as a lesson for you not to follow the same path (but none of that AI nonsense).

Hmm but if I do the same test on playground it seems to work as expected. So I am guessing maybe it could be a bug in SDK or API (?)

The playground IS the API. Press the “get code” button and see the parameter settings that it is using.

After doing some tests with API and SDK I realized that it is returning double responses only when I use the functions param. This is reproducible with all the fine-tuned models.

This leads to my next question. Do fine-tuned models support function calling?

Here’s a fine-tune model replying when it has received a history of user question, unrelated function call, and a function return (it ignores the price of Bitcoin when making a “basic HTML button”

{‘max_tokens’: 300}
Certainly! Here’s a simple HTML code for creating a basic button:

<!DOCTYPE html>
<html>
<head>
  <title>Basic Button</title>
  <style>
    .button {
      padding: 10px 20px;
      background-color: #4CAF50;
      color: white;
      border: none;
      text-align: center;
      text-decoration: none;
      display: inline-block;
      font-size: 16px;
      cursor: pointer;
    }
  </style>
</head>
<body>
  <button class="button">Click Me</button>
</body>
</html>

You can save this code in an HTML file (e.g., button.html) and open it in a web browser to see the button in action. The button will have a green background color, white text, and a simple hover effect. Feel free to modify the styles to suit your needs.
[193 tokens in 2.6s. 73.1 tps]

When including a function definition to go with the call and return:

{‘max_tokens’: 300}
Sure! Here’s a simple HTML code for a basic button:

<!DOCTYPE html>
<html>
<head>
  <title>Basic Button</title>
</head>
<body>
  <button>Click Me</button>
</body>
</html>

You can save this code in an HTML file (e.g., button.html) and open it in a web browser to see the button.
[85 tokens in 1.6s. 54.8 tps]

And then asking something relevant to the fake function and its return data:

{‘max_tokens’: 300}
As of the latest information, the price of 1 Bitcoin is approximately $13,583. Please note that cryptocurrency prices are highly volatile and can change rapidly. It’s always a good idea to check a reliable cryptocurrency exchange or financial platform for the most up-to-date prices.
[56 tokens in 1.1s. 50.7 tps]

and then finally taking out the chat history call and return and letting the AI call my function:

{‘id’: ‘chatcmpl-qqqqq’, ‘object’: ‘chat.completion’, ‘created’: 1699191998, ‘model’: ‘ft:gpt-3.5-turbo-0613:xxxx::yyyyy’, ‘choices’: [{‘index’: 0, ‘message’: {‘role’: ‘assistant’, ‘content’: None, ‘function_call’: {‘name’: ‘coindesk’, ‘arguments’: ‘{}’}}, ‘finish_reason’: ‘function_call’}], ‘usage’: {‘prompt_tokens’: 108, ‘completion_tokens’: 8, ‘total_tokens’: 116}}

So all seems well in chatland.

This model was trained 9/28, before they announced function calling was available for fine-tune models.

How did you get this? Is there a website where I can test some fine-tuned models?

I fine tuned a basic model. And then I modified python scripts for each invocation shown.

Apologies for the obfuscation that highlights the input and not the method

import openai; from openai.util import convert_to_dict as e
from openai import ChatCompletion as f; from time import time as q
def g(z):
 return [z['usage']['completion_tokens'],z['choices'][0]['message']['content']]

f={"functions": [{
    "name": "coindesk",
    "description": "get current bitcoin prices and other cryptocurrency",
        "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "natural language AI query",
            }
        },
        "required": ["post_list"],
        },
    }]}
n = 300  # this is the max tokens
c = {
"model": "ft:gpt-3.5-turbo-0613:orgorg::numnum",
"top_p": 15e-9,
"messages": [
    {
        "role": "system",
        "content": "You are ChatGPT, a large language model trained by OpenAI, "
        "based on the GPT-3.5 architecture.\n"
        "Knowledge cutoff: 2021-09\nCurrent date: 2023-11-05",
    },
    {
        "role": "user",
        "content": "What's bitcoin going for these days?",
    },
    {
        "role": "assistant",
        "content": "I'll get the price before writing.",
        "function_call": {
            "name": "coindesk",
            "arguments": '{"query": "price of Bitcoin"}',
        },
    },
    {
        "role": "function",
        "name": "coindesk",
        "content": "1 Bitcoin: $13583 dollars"},
],
}

s=q();m={"max_tokens": n};print(m);o=e(f.create(**c,**m, **f));d=q()-s;x=g(o)[0]
print(g(o)[1]+f"\n[{x} tokens in {d:.1f}s. {(x/d):.1f} tps]")

You’d also need to set an API key, as mine is hard-coded in the library. You can just “print(o)” to see the full json response.

1 Like

Thanks for the code. :raised_hands:

I tried creating a new fine-tuned model with a very small dataset and that did not produce any such issues. So I have concluded that my old fine-tuned model was broken for function calling.:person_shrugging:

My aim with the fine-tuned model is to train it on documentation which the Gpt model doesn’t have access to. By any chance do you know what is the best way to do that?

The reason why I am using the function params is that fine-tuned model was not good with number-related responses (it starts to give random numbers) So I wanted to include function calling to grant it access to more data.