About fine tuning - Your opinion?

My attention comment wasn’t about anything you said recently @Foxalabs, the gears started turning when I read this comment here:

But had more extensive comments over in this thread:

I see a lot of attention dilution and worries, which I admit are real, but unless you are using GPT-4-32k, you probably don’t have to worry about it.

Maybe the only types of domains where you could worry about attention, is detailed coding syntax things, or similar, where you may want to reduce the tokens.

I’m just thinking if “it takes me more to concentrate on it, then reduce the tokens for the model to also concentrate on it”. I don’t have any proof this intuition is true or not, because I assume “rare things have less training than common things” in the training stages of these models.

Of course OpenAI could have trained the hell out of coding in these models, and skimmed natural language. I don’t really know. But the question is, is there more natural language than code? I don’t know, there is a lot of code online in GitHub, maybe more than all the blogs combined.

2 Likes