Pretraining a Model From Scratch. Help a dude

cubanzemulax · May 16, 2024, 10:46am

Hi Developers,

So, after being fascinated by ChatGPT, I decided to take a deep exploration into how it works, which eventually led me on a path to develop a transformer model from scratch. It took me months, but it has been worth it, and the proof of concept seems functional (I can take a smaller dataset and all the processes involved work; data preprocessing, training, and inference).

I am nearing the point where I would like to pretrain. I have prepared 1 billion tokens as my dataset (I have 300M and 120M tokens as backups just in case I can’t afford the 1B). I want this to be a smaller model, and my budget constraint is enough to utilize the RTX A5000 GPU for a week. My model is autoregressive, and I want it solely for text generation.

My worry is that I feel like I am not ready to kickstart the pretraining itself. I fear that my chosen learning rate (1e-5) might not be the optimal LR for the dataset. I don’t want to pretrain the whole week only to get inadequate performance, which would require me to pretrain again. I would love to make this my breakthrough. I feel like I have done enough research to near the pretraining stage.

So my request is for advice only:

How do I know that I am ready to pretrain?
How do I know that this is the right learning rate?
What about the learning rate scheduler? Where should I set it at?

From my research, most huge models only end up with less than 10 epochs, so on one side, I feel like I don’t really need the scheduler.

Please don’t blast me about pretraining from scratch, I believe I have good reasons why I am doing this the hard way. One of them is that this is a great adventure and a superb learning process.

Thank you for your help.

N2U · May 16, 2024, 12:33pm

I always applaud a good learning project! But the price for renting an RTX A5000 GPU, is about $0.26/hr or about $43 for a week.

You generally need to experiment a bit with stuff like learning rate and the amount of epochs, so my best advice for you is to save up a bit more money, buy a used consumer GPU from last generation, and start by creating a smaller model that you can train on your own hardware

cubanzemulax · May 16, 2024, 1:55pm

hmm that’s a good suggestion over there.
Like I mentioned in my post, I have been experimenting with tiny datasets up to the point where I outgrew most freely available GPUs including my tiny RTX2060.
I believe that I have done some good amount of experimentation.
Also the A5000 is actually less than $35 and thats surely affordable to me.

N2U · May 16, 2024, 2:03pm

Your graphics card is still bigger than mine, so no worries there

Have you done any benchmarks on the model you’ve created? If not that’s definitely something I highly recommend that you do.

There’s no use in creating a larger model if the performance isn’t better, so you definitely need to have tests and benchmarks ready in order to compare them.

cubanzemulax · May 16, 2024, 2:05pm

Yea I did not think about benchmarks. I guess I must do now.

cubanzemulax · May 16, 2024, 2:08pm

question, for the benchmarks do you compare the base model or the finetuned version?

N2U · May 16, 2024, 2:26pm

Sounds like a good idea!

You can also compare against model’s that are created by others, or different checkpoints that you save during training. The only thing you need to remember is to make sure that the data in your benchmarks isn’t present in your training data as well

cubanzemulax · May 16, 2024, 2:29pm

damn, no rest for the wicked
I think I should also modify my code to save every epoch, thanks for pointing that out

N2U · May 16, 2024, 2:30pm

Always happy to help!

I hope you have fun and get some good results

grandell1234 · May 16, 2024, 2:37pm

This seems like a cool project. I also have an architecture for my own models which I train on some basic data. I was wondering where you get your training data from.

cubanzemulax · May 16, 2024, 2:48pm

I get the data from Huggingface

grandell1234 · May 16, 2024, 2:55pm

Okay, I get most of my data from Huggingface too, I just was looking to see if you had a different source. Thanks!

cubanzemulax · May 16, 2024, 3:18pm

No problem! I find huggingface more intuitive than other sources.

grandell1234 · May 16, 2024, 3:19pm

The hardest part for these projects is that I can’t test out if my script completely works until I pay for a super expensive server. So sometimes I pay a bunch of money just to find out my script is faulty and it doesn’t work.

cubanzemulax · May 16, 2024, 3:52pm

Really? you cannot use Kaggle? or Google colab?

grandell1234 · May 25, 2024, 2:08pm

Sorry for the late response, it seems I missed your post:

It crashes because they aren’t powerful enough.

cubanzemulax · May 25, 2024, 2:37pm

No problem…
Have yu tried vast?
https://cloud.vast.ai/?ref_id=112020
They have cheaper gpus.
Use my link and I’ll get a commision if you sign up. I can guarantee that yu will like it

N2U · May 25, 2024, 2:40pm

What do you mean? Google can be a bit stingy with the free GPU’s and will kick you for “inactivity” if you’re using it for training, but kaggle will let you add a NVIDIA Tesla P100 to run a jupyter notebook

N2U · May 25, 2024, 2:43pm

Vast looks pretty interesting, thanks for sharing

For anyone who’s curious, here’s the their pricing page:

https://cloud.vast.ai/create

cubanzemulax · May 25, 2024, 4:10pm

Kaggle wont make a diff for him since hez got so much data

Topic		Replies	Views
Could Someone Give me Advice on Best Practices for Training Large Language Models? Community large-language-model , training	0	395	April 29, 2024
Building Own Knowledge Base LLM Community embeddings , chatgpt , api , assistants-api	3	6101	April 8, 2024
How to obtain optimal hyperparameters when tuning GPT API fine-tuning , gpt	0	1670	January 15, 2024
Training a Custom GPT Model for Call Transcripts Analysis GPT builders chatgpt	4	3054	May 17, 2024
Poor fine-tuning results of GPT 3.5 API	3	1037	February 21, 2024

Pretraining a Model From Scratch. Help a dude

Related topics