Embeddings Text Prep

Is there available a comprehensive list of text preparations that affect the embeddings? I know about line feeds because you mention it in the documentation. But I’ve run into other things that seem to make enough of a difference not to ignore. (For example, the cosine similarity between “I have a dream” and “I have a dream.” on davinci is 0.9780036.) Also, it would be good to know whether stop words should be removed or included. Basically, what considerations are there for maximizing quality?

1 Like