Data without meaning gets the highest similarity score

I have the following case. We have two English documents containing technical data about engines. These are chunked and embedded using the Langchain framework and OpenIAEmbeddings, using the model text-embeddings-ada-002. Due to some processing of the data there is a chunk that only contains --- (a markdown horizontal rule).

When providing a question in English to retrieve the relevant chunks, using the same model and a cosine similarity, this chunk is ranked 22nd and has a score of 0.732305… The first item has a score of 0.780513…

However, providing the same question in another language, moves this irrelevant chunk as the first matched item, with a score of 0.737243… While the first item for the English version is moved down to the 4th position (score 0.727817…).

Now I do understand that the matches can vary between different languages, but I do not expect that 21 somewhat relevant matches in English, become less relevant than --- when the same query is used in a different language. Can someone shed some light on this? Why is --- not always at the bottom of the list, as it contains no meaning?