for me the same. I have small txt file, which even less then 1k tokens and it was always working in my assistant. but right now i could not use it in the playground, it says “Failed to index file: Error extracting text from file file-kR8adPtDRd7EecD39EJHqEaN detail_str=‘, detail: File contains too may tokens. Max allowed tokens per file is 2000000.’ self.error_code=<FileParsingErrorCode.TOO_MAY…”. looks like you have a bug open ai…
upd: actually it doesn’t matter what file i upload, I get every time same error. even if i create new vector database and attach file to it, it’s always failing with same error, after i try to switch on file search for assistant.
There is parsing done on any uploaded file to try to determine a type, despite what extension you’ve used. This could fail if your file was a binary upload and has data that is not UTF-8.
It is possible to text-bomb OpenAI’s token encoder and leave it looping. Too long a processing time might return an inaccurate error message.
A first test is to add some placeholder text to the text files so that it cannot be read as another type. Add some lines of prepended text, such as “Here’s the start of a new document, which contains new knowledge for the AI”, and you can also add to the end. Some preprocessing or sanitation of .txt extensions yourself can make them more acceptable to this odd inspection being done.