I believe LLMs today are lacking a possible abstraction-crystallization step. Often solutions are dictated by the expected form of the outcome, leading to the “obvious solution,” not considering novel alternatives. It’s the top probability, and the gap is large, so an LLM isn’t going to default to the ‘outside-of-the-box’ solution.
The reason I mention this is because if you ‘poll’ an LLM, reseeding it with the question numerous times and choosing the best output, you usually arrive at a better conclusion. This is due to the nature of temperature, as it is implemented. It would be even better if the AI learned from these things, asked questions, and refined the possibility space.
Instead of doing this over many inference steps, this could be generalized, similar to a human mind. We don’t have top probabilities in our mind to choose tokens with. We have many regions of activation converging to form an understanding, from which we engage concepts and see which things emerge as possibilities, given the constraints we know.
What I’m suggesting is a step which takes the next phrase probability (not token level) and creates an additional association from that to other phrases through a high-dimensional vector space. You can conceptualize this at a lower level by placing phrases along two dimensions, on a plane, as points.
In this scenario, you might see an association between phrases as a similar coordinate location in a plane. With 2 dimensions, this is quite simplistic and doesn’t factor in the nuance of relationships. For instance, Tomato and Apple are probably going to be close because of the fact they are both food. On the other hand, they differ when discussing “fruit salads.” We could add a separate dimension which measures fruit salad-ness and call it a day. Suddenly it is 3D, and there is an additional association.
Now we consider n-dimensions for our plotting, where n is a sampling of the distribution of most likely concepts. This is to say, chosen based on frequency, dimensions are provided to map associations, where each dimension corresponds to a phrase. “fruit salad.” Semantics.
Now, we can compute the outcomes of decisions directly through vector similarity to the prompt and rank them. We would be able to compute thousands of phrases with varying degrees of relevance to our prompt and choose the solution states by chaining together phrases based on top P from many different phrase starting points and identifying where they converge. Specifically, which chains are different than Top P but have the same ending point. You will find that most end up in the ‘same places,’ meaning they are semantically similar (as can be judged by a simpler transformer), but there will be outliers. These outliers are arguably most interesting, as they will be things you yourself wouldn’t think of. The whole point of a solutions-oriented chatbot.
Notice how from A to B, Red crosses all of the points that Blue does. But Red includes more novel information. If we asked for a highly cohesive output, I’d want Blue most likely. If I wanted new ideas, I’d ask for Red.
I know this is very theoretical, but how cool would it be to have a slider that considers ‘most likely’ VS ‘more intriguing’ solutions to problems?