I am converting PDF’s to text, to then give to the embeddings API.
I am using CharacterTextSplitter to split the document into sections. A possible problem with this, is that it often cuts the sentences in the middle. I am not too familiar with Embeddings, so I wanted to ask, if this could cause problems, and possibly cause it to misunderstand/not get the whole context of a sentence/section.
An example could be:
Section 1: “For example, bacteria will spoil milk in two or three hours if the milk is left out on the kitchen counter at”
Section 2: “the kitchen counter at room temperature. However, by reducing the temperature of the milk,”
EDIT: Just to clarify, my sections are substantially longer than the examples. Chunk size of 1000 with a 200 chunk overlap.