Transformer language models will likely go “off the rails” and into nonsense if they were allowed to generate without limit to the maximum context. This can be seen in completion models that do not have training on any particular length and just continue writing.
OpenAI models have chat post-training. A symptom seen is a wanting to finish outputs before 1000-2000 tokens, depending on the model, no matter the creative exercise asked of it.
Therefore, you have the right approach: You can provide an outline or have the AI generate an outline, and also provide lots of other context in order to get the next part to be generated. You’ll have to target the sections to be reproduced to what can be expected of the AI model while still maintaining high quality, without activating its compression of the topic it writes to its trained output length.