Thanks. I also found this older post that was interesting.
I finally picked a simple character count for the samples, and left the choice of tokenizer to the library client code.
Thanks. I also found this older post that was interesting.
I finally picked a simple character count for the samples, and left the choice of tokenizer to the library client code.