Curious why fine tuned model doesn't work too ...fine

Hi, I have created a model first from basic, curie, with 490 prompts, to create an anagram, provided 80 % short either names or words and around 20% with phrases.

But after I had the model, testing it, it doesn’t even create anagrams, it does create some good, existing, real word or words but not even with all the letters from the prompt and also not the same number of letters…

Tried with a new one using davinci but same 490 prompts but still same result…

Maybe 490 prompts is not enough to fine tune it for such not so simple job? Or what can be the reason it doesn’t understands the “simple” fact of using same letters but rearranged?

Check out this thread to make sure you’re configuring the fine-tuning parameters effectively:

1 Like

Let me read it, I only followed the guide from openai. Many thanks!

The models predict the next word/token. The word/token is passed to the large language model not as a sequence of letters but as a vector, so the large language model doesn’t know very well what letters a word consists from.

I don’t know how the newest models work though, since they are able to produce rhymes.

so what are you saying is to check some of the new 4 models?

I have checked the previous suggestion and used 16 epochs and one thing it did good that for each existing prompt, that was trained with if I give that one will give the good , trained with, answer, but for new names, words, will still not do anagrams

Sounds like overfitting maybe - ie repeating something verbatim from the fine-tuning dataset.

What are your settings for your output? What’s your prompt look like?

How was your training data set-up? (Can we see a line or two?)

If fine-tuning doesn’t work with best of settings, OpenAI recommends doubling the dataset size and trying again… so, might need more examples if they’re formatted correctly and everything else seems okay.

Hope this is helpful!

It’s going to be very hard to fine-tune a model for Anagrams.

This eval reported a 16% accuracy using GPT-3.5 or 4.

You can probably use the dataset for your training though.

Thanks for the guess :slight_smile:

I have 490 prompts similar to these:

{"prompt":"Adrian Albu\n\n===\n\n","completion":" a bad urinal \n"}
{"prompt":"A Decimal Point\n\n===\n\n","completion":" i'm a dot in place \n"}
{"prompt":"A domesticated animal\n\n===\n\n","completion":" docile, as a man tamed it \n"}
{"prompt":"A Gentleman\n\n===\n\n","completion":" elegant man \n"}
{"prompt":"A Rolling Stone Gathers No Moss\n\n===\n\n","completion":" stroller on go, amasses nothing \n"}
{"prompt":"A telephone girl\n\n===\n\n","completion":" repeating 'hello' \n"}
{"prompt":"A telescope\n\n===\n\n","completion":" to see place \n"}
{"prompt":"Admirer\n\n===\n\n","completion":" married \n"}
{"prompt":"Alec Guinness\n\n===\n\n","completion":" genuine class \n"}
{"prompt":"Animosity\n\n===\n\n","completion":" is no amity \n"}
{"prompt":"Astronomers\n\n===\n\n","completion":" moon starers \n"}
{"prompt":"Astronomers\n\n===\n\n","completion":" no more stars \n"}
{"prompt":"Barbie doll\n\n===\n\n","completion":" i'll bare bod \n"}

OOOO, many many thanks!!

Will combine the data!