Hey OpenAI community! It’s been a while since I last posted so hi again!
I just launched Unsloth GitHub - unslothai/unsloth: 80% faster 50% less memory LLM finetuning which allows you to finetune LLMs 5x faster, use 50% less memory all on your local GPU. It currently works for Llama, so not ChatGPT’s finetuning, but I’m just sharing all the stuff I learnt! It could be useful to make OpenAI’s finetuning faster as well!!
You can now finetune
- 5x faster (5 hours to 1 hour)
- Use 50% less memory
- With 0% loss in accuracy
- All locally on NVIDIA GPUs (Tesla T4, RTX 20/30/40, Ampere, Hopper) for free!
How? By:
- Hand deriving backpropagation steps
- Optimizing matrix chain multiplication bracketing
- Writing all kernels in OpenAI’s Triton language
- Reduce data movements via inplace operations
- And doing other maths and coding trickery!
I wrote up a blog post about all the manual hand derived backprop via Introducing Unsloth.
I wrote a Google Colab for T4 for Alpaca: Google Colab which finetunes Alpaca 2x faster on a single GPU.
On Kaggle via 2 Tesla T4s on DDP: Unsloth - LAION Chip2 Kaggle, finetune LAION’s OIG 5x faster and Slim Orca 5x faster.
You can install Unsloth all locally via:
pip install "unsloth[cu118] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git"
Thanks!
