How to Reduce Fine-Tuning Error by 37%

Hello everyone!

I spent some time playing around with the OpenAI fine-tuning API and I discovered that noisy data still has drastic effects even on powerful LLMs like Davinci.

I took some time to write about how to use data-centric AI in this recently published article in KDNuggets so that you can improve your models too :slight_smile: The results I found were quite eye-opening.

Let me know what you think!


I like it! Auto-detect and remove (or correct) outliers in your training data.

But why did you embed with davinci-001 and not the newer ada-002? Some reasoning here? Wondering if you would get better results since ada-002 is supposed to be better and has way less dimensions than davinci-001.

1 Like

Welcome to the community!

Thanks for sharing your results with us.

Cleaning datasets is going to be needed even more in the months/years ahead.

1 Like