Fine tuning very very poor results

I have a JSONL file with 3974 records containing 3974 short stories from 212 different notorious English-speaking authors, anonymized (several records for each author):

{“prompt”: “Write a text in the style of author_207”, “completion”: A complete short story by this author}

I submitted it to the fine-tuning process at OpenAI. I used two models as a basis: first Curie and then Ada. Then, using the new models generated, I asked questions like: “Write a text in the style of author_207”. With both the results were terrible.

If someone wants the file to make his/her own experiments, just ask me.


This comes down to Fine tuning being a way to show the model how to do new things and see new patterns, it does not teach it new data, you could experiment with embeddings which are designed for data retrieval. OpenAI Platform

To my understanding of the current models, you should try fine tune the latest davinci model 003 and not curie (an older model) or ADA who is used for an entire different purpose (generate embeddings)

1 Like

Those are the current options for fine tuning: Curie and Ada and Babage.

Embeddings are not fit for ta application I want.
It should be fine tuning. Or not OpenAI…

Fine-tuning will not help you with teaching the model new data, or even if it does learn it, the odds of it being able to apply it back are less likely. Fine-tuning is more akin to pattern matching, where the model learns how to respond to a specific type of output. As @Foxabilo as said previously, embedding would be the best way to go about this task

1 Like

I haven’t fine-tuned in months, every since I learned the difference between fine-tuning and embedding. However, to the question posed by @sergio.r.f.oliveira , isn’t the purpose of fine-tuning to teach the models “patterns” of responses. It seems that sergio wants the model to respond to prompts in a certain manner, not respond with specific information. Isn’t this exactly what fine-tuning is supposed to achieve?

1 Like

That is a fair point, I think i missed the OP’s intent somewhat, looking at the numbers presented of ~4000 examples from ~ 200 authors that is around 20 examples per author, which seems on the low side. Perhaps this is the cause of the lower than expected perfromace.

Some authors have about 100 samples. Even for those, the results are poor.

Just to get some idea of the problem, why do the authors have to anonymous ? Is the fine-tuned model supposed to make the relation between the author or and the text from their memory ?

Just to guarantee that no previous knowledge from OpenAI standard knowledge base will be used to answer the prompts.

To teach it to emulate the style of a single author should typically have on the order of 500–1000 samples for that specific author.

I would also suggest making your prompts much more specific. Instead of just asking for “a text,” wrote a prompt in such a way that the short story makes sense as the response. It could be as simple as “Write a short story about X in the style of author_207.”


In the ‘fine-tuning fine print’ (‘What models can be fine tuned?’) OpenAI notes that the models currently available for fine-tuning are all pre-InstructGPT. So they don’t have the alignment with user intention that got trained into later models, which we have come to know and love.

I’ve experimented with this exact use-case. Embeddings are definitely the way to go, and the more examples you have, the better. My data was 55000 slack messages over the course of 3 years, and after converting each message into an embedding (after adding some context to each message. “[person] said [message] on [date]”), I was able to ask the model (gpt-3.5-turbo base) to respond to any prompt I gave in the style of [person].

TL;DR; Embeddings are not only for getting ChatGPT to learn a new knowledge base- they can also be used as high-quality examples of how ChatGPT should respond.

P.S. Massaging your data and anonymizing authors isn’t THAT necessary. Just add a premise in between the embeddings that are pulled and your prompt that tells ChatGPT to “answer the following only using the data above.”

2 Likes Interesting, could you sketch our how the embeddings are used in your process?

My understanding is that embeddings are useful to calculate text that is “close” to the prompt in order to extract relevant information to be added to the prompt as context.

In this use-case though don’t you already know which text is relevant at the point you make the prompt? So equally couldn’t you just generate the request of the form:

Here are past messages from [user]:
[Past messages]
Write a slack message in this style asking about [subject]

1 Like

ChatGPT is the result of fine-tuning and it took OpenAI almost 2 years to fine-tune the model to be good enough as ChatGPT. You need to spend time evaluating the result and fixing it’s problems, to start with I suggest adding more data to the prompt like specifying style, literary devices or author’s name.

1 Like

Yes, you could! Putting it into embeddings simply automates the process.