Hello everyone!
I spent some time playing around with the OpenAI fine-tuning API and I discovered that noisy data still has drastic effects even on powerful LLMs like Davinci.
I took some time to write about how to use data-centric AI in this recently published article in KDNuggets so that you can improve your models too The results I found were quite eye-opening.
Let me know what you think!
2 Likes
I like it! Auto-detect and remove (or correct) outliers in your training data.
But why did you embed with davinci-001
and not the newer ada-002
? Some reasoning here? Wondering if you would get better results since ada-002
is supposed to be better and has way less dimensions than davinci-001
.
1 Like
Welcome to the community!
Thanks for sharing your results with us.
Cleaning datasets is going to be needed even more in the months/years ahead.
1 Like