Trained a Small Language Model. Have Some Questions. Report

To develop an understanding of the science and engineering behind Large Language Models (LLMs), I’ve trained a small transformer-based language model from scratch and then fine-tuned it

As I’m self-learning and experimenting, it’s possible that I might develop a flawed understanding of the science and engineering practices behind LLMs.

I’m sharing the detailed process I followed, and I have some questions along the way.

I would appreciate the thoughts from the community :slight_smile:

Here’s a link to the report:

Looking forward to it.