Max output token explanation

What is the point of Max output tokens?
If the context is 128K, why is the max token so small 4k?
how can the massive input token limit be useful if the output token is this small?
What strategies can we use to get far more output from a large input?(imagine a large corpus of data and the analysis is based on that large context and needs far more output beyond the limit.)

thank you

Most demands on LLMs are asymmetric in nature.

Long input context is incredibly important.

  • Context. the longer the input context the more potentially relevant tokens arrive at the other end.

  • Tools: It is especially useful when using functions (tools) for both feeding the LLM with function definitions but crucially also providing answers, eg fron RAG.

  • Summarisation: it is critical in applications where you wish to summarise.

The list goes on …

If you need a longer output just call the LLM again …