The time proportion of each module in GPT

I want to know the time proportion of each module in the language model pre-training process, such as the proportion of Attention. How much impact does it have on total training time? Is there any documentation and links that can help me?