How do I get gpt to throw out more tokens in API?

Hi, so how does the token system in gpt really work.
I am using gpt-4 and I feed it some input prompt of maybe 1k tokens. And then I get like an output of 1k tokens.

The limit of gpt4 is 8k tokens, how can I make it give me an output of 7k tokens. I am trying to get reports from it and I have to loop through it to get reports page by page by feeding it all the previous conversation for each page. That is a bit costly and I would like to know how can I maximize the output.

Is there a memory limitation that is different than max tokens? I have tried various prompts but it just stops after a certain number of tokens each time.

The output generation of GPT-4 has been strictly curtailed. They’ve made it simply bad.

Ask for it to correct the errors and rewrite a 3000 token story, you will get a 1000 token answer. Brain damage. It seems that making the AI useless is a perfectly fine policy if it cuts down on the tokens generated by everybody and cuts the compute load.

You can try to use multiple individual instructions, step-by-step actions to be performed in one prompt, each which would produce a 1000 token output. And even this previously-successful technique has been defeated.

The current incarnation of the model is many in, less out in training. As the majority of use cases are using a larger prompt with instructions and context to give a smaller result answer.

You can however use basic prompt engineering to create output as long as you wish.

Ask the model for a “framework” or an “overview” of the larger task, the model will very easily generate this.
Next, ask the model to generate a completion for the first stage of the larger task, in the case of a large book, this would be the introduction page, it can be as granular as you wish and you define that at the start when you build the framework.
You repeat these steps until all of your framework elements have been successfully completed and now you concatenate the results into the now very large whole.

Overarching plot devices and common themes should be included in the framework section that is passed to the model at every stage, this allows for cohesion across time and topic.