Feature Request: FineTunes modalities and stoptoken frequency toggles

Would be fun to play with models where we could slowly reduce the stop token freq, and try to fine tune slowly for tasks in a longer and longer inf. Yes it would likely exacerbate the memory-long horizon problem esp if we are pushing out old text/image but hmmm
I’m imagine training finetunes like a store Point of Sales gui or playing a game and issuing pyautogui commands and reward/rating good responses while giving lower scores to wrong commands.

A singular inference in the now might be a better agent for some tasks, even if it’s stuck in the now/has severe amnesia and forgets quite quickly. Agents based off multi inf is quite cumbersome, until we get some kind of true memory to stitch infs together or inform the inf. hmm