I’m currently working with a set of 1000 documents. I’ve decided to chunk the documents into smaller parts with a length of 800 and an overlap of 50. I need to process them using an embedding function. However, I’ve encountered a token limit issue when trying to pass all the chunks at once.
To overcome this, I am thinking of repeatedly requesting the API endpoint until all my chunks are embedded. My challenge now is to handle the rate limit of 3 requests per minute for free users like myself.
The idea I have in mind is to process the chunks iteratively, counting the tokens in each chunk. Once the token limit is reached, I plan to introduce a 1-minute sleep before resuming with the remaining chunks.
I’m looking for suggestions on how to design an algorithm that can effectively handle this situation. I’m still relatively new to this field and exploring different methods, so I would greatly appreciate any help or suggestions on how to approach this problem.
Thank you in advance for your assistance!