Thank you for all your reply!
Apologies, I believe I might be a bit perplexed. However, this is my interpretation of _j’s post.
Now, let’s consider the following sentences related to baseball and cricket.
Here are six sentences regarding a baseball bat:
The baseball bat, an iconic piece of sports equipment, embodies the essence of America’s pastime. Crafted with precision and care, it serves as a conduit between player and ball. These bats come in various sizes, typically ranging from 28 to 34 inches in length, tailored to suit individual players’ preferences and hitting styles. The size of the baseball bat plays a crucial role in determining a player’s swing mechanics and power potential. A longer bat offers extended reach, allowing hitters to cover more of the plate, while a shorter bat provides greater control. The selection of the right bat size is a meticulous process, reflecting the fusion of technology and tradition that defines the sport and empowers players to excel on the diamond.
Here are two sentences regarding a cricket bat:
The cricket bat, symbolizing tradition and finesse in the gentleman’s game, follows strict size parameters set by cricket laws, with a blade length not surpassing 38 inches and width of 4.25 inches. This ensures a delicate equilibrium between power and control, accommodating diverse playing styles and pitch conditions.
Imagine we calculate embeddings, sentence by sentence, as shown in the following image. Upon calculating embeddings for each sentence, it becomes apparent that the sentences related to baseball have a dominant presence in the outputs generated by the LLM.
Furthermore, it’s worth noting that not every sentence contains the term “baseball,” a similar situation observed in the context of “cricket.”
If we were to utilize the embeddings as they are, the outcome would likely be heavily skewed towards baseball-related content.
However, the introduction of a weighted average could serve to normalize the entire paragraph while also disseminating additional contextual information about baseball (or cricket) to each individual sentence.

While I may be entertaining an unconventional thought, I’m struggling to grasp the reasoning behind the expected use of a weighted average in situations involving lengthy textual content.