Hi folks.
I have thought to start learning embeddings. My plan is to start with GPT4All and the nomic-embed-text it provides with it.
Everywhere it is said that “You can feed pdf, xls, doc, your mother…whatever” to embed model. But should I?
For example, does it help embed model if I arrange my data to simple XML?
<xml>
<data>
<date>12345</date>
<text>foo barr is funny thing</text>
</data>
<data>
<date>54321>/date>
...
..
Or maybe in even simpler form
START DATA:
date: 12345
text: foo barr is funny thing
END DATA:
START DATA:
date:
...
...
Or is it really so, that it does not matter?