Failed to index file File contains too may tokens. Max allowed tokens per file is 2000000

Hi, what am I missing? Doc says 512mb limit, but not the tokens. What is wrong?

Basically I was just trying to create an assistant with Retrieval tool, I did file specifically less than 512mb tried it and got this error.

1 Like

Hi and welcome to the Developer Forum!

Not seen that message before, 2M tokens would equate to approximately 8Megabytes of pure text.

I’ve seen that message in messages before.

Hi guys, thanks for fast response.

@_j hmm thanks, but what does it mean though? I need to upload huge files for retrieval, like all 20 will be 500mb for sure. Does it hard limit of backend engine, or what?

It means that the documentation does not match the capabilities seen in practice.

If intentional, I expect that someone scratched their head hard about the backend costs of chunking and embedding half a gigabyte of data just so someone can ask some questions only informed by 0.001% of that upload.

text-embedding-ada-002 is $0.10 per megatoken without overlaps. Then they say ā€œmaximum 10 GB per assistantā€. Upload a DVD binary rip of ā€œOffice Spaceā€ as text files. Say hi to a chatbot. Delete. $500 of backend.

Yeah, same feelings. Don’t get this monetization strategy, also seems like they are quite weak in writing API layer…

1 Like

Same error with a 45MB file, got around it by breaking up into 12MB