My total token count is 2342 (both input and output). Even though gpt-3.5-turbo has a limit of 4,096 tokens per request, I’m getting this error:
Token indices sequence length is longer than the specified maximum sequence length for this model (2342 > 1024). Running this sequence through the model will result in indexing errors
Anyone knows why this happens?
This is more-than-likely because you have max_tokens
set to 1024 in your chat completion params.

Appendix
Here, I set max_tokens
to 3000, and this is the resulting error:
However, if I change it max_tokens
to 1024, then it works:
Testing is good and reveals all 
I found out what it was, a dummy mistake 
I had content moderation endpoint which was checking all the outputs and is limited to 1024 tokens 
All good now!
Interesting.
There is no max_tokens
parameter in the moderation endpoint.
My fault, @ruby_coder . Actually, your previous comment made me investigate further and it appears it was GPT2 Tokenizer limit.
I just removed this from the code as it’s obsolete for a very long time since OpenAI introduced a better method for getting the token spend info.
Thanks!
Thanks for the tip, @karaburmication
I encountered the same warning when using llama-index, and came to the same conclusion that it’s actually just a tokenizer warning, rather than actually truncating the text.
Have you offered a suggestion to the developers to upgrade this?
@prashanth.bhat - to be honest, I haven’t. Kinda focusing on super-important stuff, this is low priority at the moment 