Fine tuning very very poor results

sergio.r.f.oliveira · July 3, 2023, 2:49pm

I have a JSONL file with 3974 records containing 3974 short stories from 212 different notorious English-speaking authors, anonymized (several records for each author):

…

{“prompt”: “Write a text in the style of author_207”, “completion”: A complete short story by this author}
…

I submitted it to the fine-tuning process at OpenAI. I used two models as a basis: first Curie and then Ada. Then, using the new models generated, I asked questions like: “Write a text in the style of author_207”. With both the results were terrible.

If someone wants the file to make his/her own experiments, just ask me.

Foxalabs · July 3, 2023, 3:15pm

This comes down to Fine tuning being a way to show the model how to do new things and see new patterns, it does not teach it new data, you could experiment with embeddings which are designed for data retrieval. OpenAI Platform

felix.dumitrascu · July 3, 2023, 4:28pm

To my understanding of the current models, you should try fine tune the latest davinci model 003 and not curie (an older model) or ADA who is used for an entire different purpose (generate embeddings)

sergio.r.f.oliveira · July 3, 2023, 5:27pm

Those are the current options for fine tuning: Curie and Ada and Babage.

sergio.r.f.oliveira · July 3, 2023, 5:29pm

Embeddings are not fit for ta application I want.
It should be fine tuning. Or not OpenAI…

udm17 · July 4, 2023, 4:54am

Fine-tuning will not help you with teaching the model new data, or even if it does learn it, the odds of it being able to apply it back are less likely. Fine-tuning is more akin to pattern matching, where the model learns how to respond to a specific type of output. As @Foxalabs as said previously, embedding would be the best way to go about this task

SomebodySysop · July 4, 2023, 5:18am

I haven’t fine-tuned in months, every since I learned the difference between fine-tuning and embedding. However, to the question posed by @sergio.r.f.oliveira , isn’t the purpose of fine-tuning to teach the models “patterns” of responses. It seems that sergio wants the model to respond to prompts in a certain manner, not respond with specific information. Isn’t this exactly what fine-tuning is supposed to achieve?

Foxalabs · July 4, 2023, 6:33am

That is a fair point, I think i missed the OP’s intent somewhat, looking at the numbers presented of ~4000 examples from ~ 200 authors that is around 20 examples per author, which seems on the low side. Perhaps this is the cause of the lower than expected perfromace.

sergio.r.f.oliveira · July 4, 2023, 11:21am

Some authors have about 100 samples. Even for those, the results are poor.

udm17 · July 4, 2023, 11:26am

Just to get some idea of the problem, why do the authors have to anonymous ? Is the fine-tuned model supposed to make the relation between the author or and the text from their memory ?

sergio.r.f.oliveira · July 4, 2023, 2:58pm

Just to guarantee that no previous knowledge from OpenAI standard knowledge base will be used to answer the prompts.

anon22939549 · July 4, 2023, 5:53pm

To teach it to emulate the style of a single author should typically have on the order of 500–1000 samples for that specific author.

sergio.r.f.oliveira:

English-speaking authors, anonymized (several records for each author):
{“prompt”: “Write a text in the style of author_207”, “completion”: A complete short story by this author}

I would also suggest making your prompts much more specific. Instead of just asking for “a text,” wrote a prompt in such a way that the short story makes sense as the response. It could be as simple as “Write a short story about X in the style of author_207.”

ben.basseri · July 6, 2023, 1:53am

In the ‘fine-tuning fine print’ (‘What models can be fine tuned?’) OpenAI notes that the models currently available for fine-tuning are all pre-InstructGPT. So they don’t have the alignment with user intention that got trained into later models, which we have come to know and love.

zachary.suzuki · July 6, 2023, 4:42am

I’ve experimented with this exact use-case. Embeddings are definitely the way to go, and the more examples you have, the better. My data was 55000 slack messages over the course of 3 years, and after converting each message into an embedding (after adding some context to each message. “[person] said [message] on [date]”), I was able to ask the model (gpt-3.5-turbo base) to respond to any prompt I gave in the style of [person].

TL;DR; Embeddings are not only for getting ChatGPT to learn a new knowledge base- they can also be used as high-quality examples of how ChatGPT should respond.

P.S. Massaging your data and anonymizing authors isn’t THAT necessary. Just add a premise in between the embeddings that are pulled and your prompt that tells ChatGPT to “answer the following only using the data above.”

martin.watts · July 9, 2023, 2:01pm

zachary.suzuki Interesting, could you sketch our how the embeddings are used in your process?

My understanding is that embeddings are useful to calculate text that is “close” to the prompt in order to extract relevant information to be added to the prompt as context.

In this use-case though don’t you already know which text is relevant at the point you make the prompt? So equally couldn’t you just generate the request of the form:

Here are past messages from [user]:
[Past messages]
Write a slack message in this style asking about [subject]

kevin6 · July 9, 2023, 6:10pm

ChatGPT is the result of fine-tuning and it took OpenAI almost 2 years to fine-tune the model to be good enough as ChatGPT. You need to spend time evaluating the result and fixing it’s problems, to start with I suggest adding more data to the prompt like specifying style, literary devices or author’s name.

zachary.suzuki · July 11, 2023, 9:07pm

Yes, you could! Putting it into embeddings simply automates the process.

Topic		Replies	Views
Struggling with poor performance on fine-tuned davinci model API	15	2734	December 20, 2023
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1906	December 18, 2023
What's better for the type of chatbot I am building? Fine tune or embedding? Community chatgpt , api	10	2333	August 20, 2023
Fine Tuning ChatGPT with large text from Books Prompting	18	11785	March 26, 2024
Are fine-tuned models a good way to give GPT a specific tone of voice? API api	5	4118	July 20, 2023

Fine tuning very very poor results

Related topics