Hi there!
Summarization is a frequent topic here in this forum and there are a few tried and tested methods for summarization of longer inputs. See the following post by @Diet for an illustrative overview of how to approach it.
In general, the approach depends somewhat on the level of granularity for the summary you are looking for. Technically, if your input text is solely 10,000 tokens, you can generate a summary via one API call. However, the maximum length of your summary is bound by completion token limit (see below). If you are looking for a high level of detail, then an approach like the one shared in the post is to first chunk your original text and then prepare summaries for individual chunks.
On your related question:
Essentially the context limit, i.e. 128,000 in the case of the GPT-4-turbo series, represents the upper limit of tokens for a given request. In sum, input tokens and completion tokens cannot exceed this limit. The 4,096 limit applies to the completion tokens, not the input tokens which also include the instructions along with additional context you are providing such as the original text for summarization in your case.
In practice, though, the 4,096 output token limit is rarely reached and you are more likely to get outputs in the magnitude of anywhere between 800-2000 tokens, depending on how you design your prompt. So linking this back to the point above, if you are looking for more details in your summary, then you need to opt for an approach that involves splitting/chunking the input text and running multiple API calls, the outputs of which you then combine to arrive at the full, detailed text summary.
I hope that makes sense and helps a bit.