To develop an understanding of the science and engineering behind Large Language Models (LLMs), I’ve trained a small transformer-based language model from scratch and then fine-tuned it
As I’m self-learning and experimenting, it’s possible that I might develop a flawed understanding of the science and engineering practices behind LLMs.
I’m sharing the detailed process I followed, and I have some questions along the way.
I would appreciate the thoughts from the community
Here’s a link to the report: mittalh.notion.site/Small-Language-Model-Report-Questions-43a86edbdd954cd0a7f4df2c9dcd8408
Looking forward to it.