Token indices sequence length is longer

My total token count is 2342 (both input and output). Even though gpt-3.5-turbo has a limit of 4,096 tokens per request, I’m getting this error:

Token indices sequence length is longer than the specified maximum sequence length for this model (2342 > 1024). Running this sequence through the model will result in indexing errors

Anyone knows why this happens?

This is more-than-likely because you have max_tokens set to 1024 in your chat completion params.



Here, I set max_tokens to 3000, and this is the resulting error:

However, if I change it max_tokens to 1024, then it works:

Testing is good and reveals all :slight_smile:

I found out what it was, a dummy mistake :slight_smile:

I had content moderation endpoint which was checking all the outputs and is limited to 1024 tokens :slight_smile:

All good now!


There is no max_tokens parameter in the moderation endpoint.

My fault, @ruby_coder . Actually, your previous comment made me investigate further and it appears it was GPT2 Tokenizer limit.

I just removed this from the code as it’s obsolete for a very long time since OpenAI introduced a better method for getting the token spend info.


Thanks for the tip, @karaburmication
I encountered the same warning when using llama-index, and came to the same conclusion that it’s actually just a tokenizer warning, rather than actually truncating the text.

Have you offered a suggestion to the developers to upgrade this?

@prashanth.bhat - to be honest, I haven’t. Kinda focusing on super-important stuff, this is low priority at the moment :slight_smile: