I was deep in thought last night considering possible ways to mitigate hallucinations in LLMs. I was curious if it could be possible to develop a system that adjusts the temperature of the model during generation. From the bit of work I’ve done, it seems you set the temperature to a desired level and that’s it for the prompt. During generation I’m sure the model encounters numerous instances where the probabilities of the next token are more evenly distrusted among a vast variety of tokens. I think in occasions like this, adjusting the temperature to rely on a higher temperature would be beneficial. Based on distribution spread of probabilities the temperature should be adjusted during runtime to accommodate for uncertainty. If the model has a natural (well artificial technically) inclination for the next token to be predicted, it makes sense to keep the temperature lower.
Alternatively could a companion model be trained to suggest temperatures to use for the next token? This model would understand the context, whether it is a creative prompt, allowing lower temperatures, or a more deterministic prompt grounded in fact.
I don’t know, just a couple thoughts I had.