So if i were to save computing resources, the first thing i would do with LLMs is to the cascade a hierarchy of models

Purely a question of curiosity - like if i were to ask a “basic question” and it could answer instantly, but then if i were to ask a “complex question” and if there was a significant speed difference…

I mean what was the difference in processing?

Was it processing for more criteria to autocomplete with?

I dont know know how this works, im just curious.

If such were the case, I hope it wouldn’t permanent switch into that mode, cause that would mess up my research.