Same problem. I want to translate books but I had to split it in 80 parts first. The book was around 260.000 tokens. I would love to split into only two parts The 128k context window is misleading, because that’s not the in or output limit as I understand it.
Currently I am limited to 4096 tokens max output using the gpt-4-1106-preview API model.
Yes, this is lame, but is also exactly what costs actual money that they have curtailed already in AI training: quality attention heads go quadratic with longer lengths
openai.error.InvalidRequestError: max_tokens is too large: 75000. This model supports at most 4096 completion tokens, whereas you provided 75000.
So for input/output tasks like checking spelling, this model gains you nothing except lower quality.