Could Someone Give me Advice on Best Practices for Training Large Language Models?

Hello there,

I am currently working on a project that involves training a large language model similar to GPT-3. I have some experience with smaller models, but this is my first time working with such a large scale.

How did you approach preprocessing and cleaning large datasets for training?

Are there specific modifications or adjustments you made to the architecture to handle the scale?

What training strategies did you find most effective for large models?
How did you manage resources such as GPUs, memory, and storage during training?

Any tips on evaluating and fine-tuning a large language model once it’s trained?

I am also curious to hear about any challenges you faced and how you overcame them.

Any advice, resources, or insights you can share would be greatly appreciated.

Thank you in advance for your help and assistance.