Hi, some context, I am using a Mongo database where I have a large texts as fields for my documents and I want to make embeddings for them. The issue is the texts are larger than the maximum tokens for text-embedding-3-large so I was looking for a solution.
I thought that I could split the text into multiple fields when I store it in the database, but I don’t really know if the text embedding can go over multiple fields to make embedings, or if it is a good idea.
Another idea would be to chunk the texts when I make my dataframes to make the embedings, but that would mean making multiple embedings again.
Lastly, if there is a another embeding model I haven’t considered to deal with large texts.
I am using this for my undergrad thesis and I can’t really afford to spend too much money on the API, but this embeding is the only one that works in my language, so I wanted to ask before I spent all my API credits on looking for the best method.
Thank you!