Pre-tokenization with python?

Hi,
I have about 1000 pdf files, and I wanted to split them in batches of 3000 tokens to give to gpt-3 using a python script. So my question is possible to pre-tokenize the files, split them in batches of 3000 token each and then run a python loop to apply the same prompt to each batch using python?

2 Likes