How to auto-truncate search files?

Hi is it possible to let the API automatically truncate search files to the token limit instead of throwing an error, such as

InvalidRequestError: The document at index 54 is 203 tokens over the length limit of 2027.

If not, then what am I supposed to truncate my files to in advance?
I guess I need to estimate the maximum query length I will get and then truncate the docs to 2034 - max_query_len ?

Auto-truncating might introduce unintended errors by excluding relevant information. You can implement truncation yourself using a tokenizer library (python, javascript). It’s generally good practice to keep indexed docs as small as possible (ie allows for more parallelization in other search APIs), and add relevant metadata to keep track of how docs relate to eachother (ie paragraph 1, 2, of same page).

1 Like