I have my own framework which works perfectly - it manages to split material by logical topics and feed them separately. The point was to have less chunks.
Also, this time they separated input limit and output limit, so we have to refactor existing systems to take into account that there may be 128K input and 4K output. It’s no longer “this is 8K model and this is 32K model”, now it’s “128/4 model, where you shouldn’t take input tokens into consideration”.
I’ll stick to the old model.