How to print the output over 10,000 tokens?

I’m trying to print large text contents exceeding 10,000 tokens, but I encounter a problem. Despite setting the maximum output to 16,000 tokens, the system only outputs less than 3,000 tokens.

Additionally, when I include multiple topics in a single request, it seems to summarize each one. However, when focusing on a single topic, it provides a more detailed response. Unfortunately, the system I use doesn’t allow me to separate topics into individual requests, so I have to include all topics in one submission.

How can I solve this issue? I would appreciate any insights or experiences you could share.

[This is a template of input data]

  • each “result” should be generated by gpt
    [
            {
                "title": "",
                "description": "",
                "result": ""
                "sub_subjects": [
                {
                    "title": "",
                    "description": "",
                    "result": ""
                }
                ]
            },
            {
                "title": "",
                "description": "",
                "result": ""
                "sub_subjects": [
                {
                    "title": "",
                    "description": "",
                    "result": ""
                }
                ]
            },
            {
                "title": "",
                "description": "",
                "result": ""
                "sub_subjects": [
                {
                    "title": "",
                    "description": "",
                    "result": ""
                }
                ]
            },
            {
                "title": "",
                "description": "",
                "result": ""
                "sub_subjects": [
                {
                    "title": "",
                    "description": "",
                    "result": ""
                }
                ]
            }
            ]
1 Like

Hi and welcome to the Forum!

The nature and level of detail including the number of output tokens is significantly shaped by how you phrase your prompt. There are certain techniques that you can apply in order to obtain more detailed responses. In practices, it is difficult though to get very close to the maximum number of possible output tokens.

I do not have enough information about your actual prompt but the more specific you can be about the structure of your output and the details that should be included, the higher the likelihood that you will get a more detailed response.

For example, if you ask the model to write a chapter of on topic A you are likely to get a shorter response than if you were to provide the model with additional details on how to structure the chapter, such as by specifying that the chapter should have X number of sub-sections, each comprising of X number of detailed paragraphs.

1 Like

Thank you for your reply. I will try it agin as following your answer.

1 Like

Te reach the model capacity, you must interleave messages from the user and from the assistant.

Input prompts can usulaly be big (but there is a limit!), and assistant responses can reach 4k in the normal models.

I’ve been trying to work out the “rules for controlling output length” and while I don’t have a solution that lets you perfectly control the length of outputs I have some insights I can share that might be helpful.

Insight 1
The first is that the models can’t count so asking them for a response that’s 100 words long or 10,000 words long doesn’t work. You need to relate your length request to a pattern they’ve seen. For example “all your responses should be the length of a tweet” works really well to get short answers back, they’ve seen a lot of tweets. Similarly, “all your responses should be the length of a book” works really well to get longer answers out of the model (around 1500 - 2000 tokens I’ve found.)

You might think that “all responses should be the length of a trilogy” or “all responses should be the length of a book series” would result in even longer responses but they’re actually shorter (around 900 tokens for the same topic.) Why? Who knows… My best guess is that the longest individual file that they’ve seen is a book and so to the model that’s the longest sequence it can conceptualize.

Insight 2
You can actually get the model to blow out the context window but it’s tied to the task you ask it to do (this is what @jr.2509 is referring to.) For example if you give the model 100k tokens worth of web pages with a bunch of links and you ask for all the links back, you will reliably get every link back until you run out of tokens.

Similarly, if you feed in a large web page and ask for a change to the page you will get back all of the input tokens plus the tokens related to the changes. This can also result in blowing out your context window.