Ada model fined tuned for classification hets delusional when fed with thousands of records

maksymilianpiechota · May 16, 2023, 6:36pm

Hi,
I have a fine tuned Ada model for classification for around 50 classes.

My prompt end pattern is “\n\n###\n\n” and the response end pattern is “$$$”

It works really great but i noticed that it gets delusional after greater amount of prompts in a row, like few hundreds. Setting its temperature to 0 helped at first but then I ran it again for 9k records and it started to give false results again.

By false result I understand that the class returned for this same prompt is different (wrong) when it is ran in the queue of n thousands records than the result I get when I feed it once separately to this same model with those same parameters.

I feel like the model gets saturated in some way when its fed with that many examples? That there is some lekeage of tokens?

I would appreciate any clue or direction what can be the reason and what steps I can take to improve the results

sps · May 16, 2023, 8:13pm

Welcome @maksymilianpiechota

Are you batching the prompts?

The docs mention not to exceed 2048 token for classification.

PS: embeddings is another cost-effective method for classification

maksymilianpiechota · May 17, 2023, 8:43am

I am not batching, I am sending one prompt after another (waiting for the first request to be responded)

I am classifying job titles so this is rarely more than 3 words, so I suppose I do not exceed the 2048 tokens limitation.

Thanks for the embeddings tip, I will look into it, but for now I need to resolve the issue with my fine tuned model.

maksymilianpiechota · May 19, 2023, 5:48am

Is there anyone who can help?

Is anyone from OpenAI monitoring the issues raised on this forum?

sps · May 19, 2023, 5:55pm

Can you confirm how it’s performing when you use batching @maksymilianpiechota ?

kevin6 · May 19, 2023, 6:41pm

My suggestion is:

Use Babbage with a dataset of 700-900 examples.
You need more context in your dataset.

If I had some examples of your dataset, I would probably have a clue what is wrong.

maksymilianpiechota · May 20, 2023, 9:20am

I have trained the ADA with around 10k examples.

Now I have trained the babbage as you suggested with 800 examples, considering 37 classes, I had 22 examples per class.

And I get much worse results in the babbage then the previously trained ada (even testing separately, for just one classification)

P.S. I will consult with my customer if I can share the data set

kevin6 · May 23, 2023, 11:06pm

Here are some more tips:

Add more volume, if 800 is not enough (more diverse data is better). Depending on the training data If you add more data the performance will likely increase.
The best data is similar to what you’ll use the model for. Try to format it in a clear way that makes sense to read it. for instance

Prompt:

Happy day → sounds positive

Completion:

True

Set the prompt loss weight to smaller value like 0.1, this make the prompts less sensitive
For N_epochs with more data you may need fewer epochs
Temperature 0

Topic		Replies	Views
Struggling with poor performance on fine-tuned davinci model API	15	2755	December 20, 2023
Issues with Fine-Tuned Babbage-002 Model Returning Incorrect Completions Prompting gpt-4 , chatgpt	13	1896	December 21, 2023
Using the new fine-tunes endpoint for binary classification API fine-tuning , python	10	2305	January 11, 2024
Help with fine-tuning for text categorization API	4	1366	December 16, 2023
Strange behavior of a fine tuned model API	6	2039	December 20, 2023

Ada model fined tuned for classification hets delusional when fed with thousands of records

Related topics