I have a question regarding the example provided in the following openai-cookbook.
Embedding texts that are longer than the model’s maximum context length
I am curious about the rationale behind utilizing a weighted average for each chunk’s embedding. I noticed there is a flag available to calculate this weighted average, with a default value of True. Consequently, it appears that the expectation is for each chunk’s embedding to be a weighted average.
I tried to find relevant information, but I have been unable to find an explanation for the necessity of this weighted average approach. While it might be evident to others, I am struggling to comprehend the underlying reasons for employing this process.
For instance, in scenarios where the token count is divided into 8191 and 8191, it is clear that the two vectors are simply split in half. Similarly, when dealing with token counts like 8191 and 4095, a weighted average can be applied to generate an averaged vector that captures the overall meaning. However, my confusion arises when considering cases like 8191 tokens versus a significantly smaller count, such as 100 tokens. In such instances, I am uncertain if the weighted average retains its significance.
Furthermore, I am not sure whether it is valid to solely rely on the token count as the basis for weighting the vector of embeddings in the first place.
Thank you.