Thanks for that link too - very useful. It’s a shame they don’t use the same colour for distinct tokens either… but they do have a nice hover feature to check, so it’s an improvement on the OpenAI online tokenizer.
I was referring to Jake Elmstedt (@anon22939549)
I don’t think you need to re-encode all your documents. With no leading space in the query it will tend to look for documents starting with your query. Adding a leading space it will tend to look for documents containing the query. So, depending on you app needs, you can either expose this a search option, or create two query embeddings (one with the space) and combine the top ranking results.
[UPDATE: I think you’ll need to run some experiments. See below.]
I compared cosine similarities for queries and documents with and without leading spaces, adding a space to either the query or the document resulted in worse results (i.e. lower CS values, higher angle of separation).
Bit perplexed.