You CAN specify the length of the response

I wanted to post a quick tip

I can see many USE CASE’s where you want to limit the response you get from Chat GPT to a certain length and make the response meaningful - ie Write a script for a 60 second commercial

In short - I want to limit the length of a response - here are my findings

Here is a bad way to do it
If you just specify the max_tokens in the API call and try to limit to a certain number of tokens, you may get truncated summaries since the response will just cut off the response at the max-tokens

So here is the way to do it properly…

Put the following in your prompt - here is an example

“Create a very short summary that uses 30 completion_tokens or less…”

You can put whatever number of completion_tokens desired in the prompt within a reasonable amount - don’t put in 1 completion_token…

You will get a meaningful and very short summary - and save a few pennies

Use this approach and try different numbers of completion_tokens and “tune” for what works best in terms of concise summary and meaningful summary

Respond if this does not work for you or if this is helpful and does work


“limit your response to 300 words” or whatever word count you want also works.

The AI model uses Tokens, not Words, I had good results specifying completion_tokens

Not disagreeing with trying X Words in the prompt but I suspect Tokens would be a better approach

Interested in others experience with Words versus Tokens…

Thats pretty wild if the LLM itself can count tokens.

It works ok. I have been getting it to summarize text documents into chunks with a specific token count i have noticed if the response is short it will repeat itself to reach the token count. # Define a function to read the text document.

def read_text_document(file_path):
with open(file_path, ‘r’) as f:
chunks =“\n===”)
text_document = " ".join(chunks)
return text_document

def write_to_text_document(file_path, data, max_tokens_per_chunk):
with open(file_path, ‘a’) as f:
token_count = len(data.split())
if token_count > max_tokens_per_chunk:
chunks = [data[i:i+max_tokens_per_chunk] for i in range(0, len(data), max_tokens_per_chunk)]
for chunk in chunks:
f.write(chunk + “\n===\n”)
f.write(data + “\n===\n”)

It looks CharGPT CAN count Tokens based upon the completion_token counts I am seeing in the responses when I use this approach with the API

As an update - I have noticed a SLIGHT amount of truncation so a bit of “tuning” may be required by playing around with the number of completion_tokens or playing with the response you get and cleaning it up

This approach does seem to work a lot better than just specifying max_tokens in the call to the API

That said - interested to see what others are seeing…

I had my first session with ChatGPT today. As a developer watching the global buzz I wanted to see how capable it is in generating code, specifically JavaScript, ReactJS, React Hook Forms. I approached it slowly in a single session, attempting to collaboratively build a small project. I progressed through convert to TypeScript, add validation with Zod, add MUI and revise all UI components, and then : refactor components out to separate files.

It did amazingly well through al of this, with validation details for specific fields, etc. It started to falter when a single file got larger. I tried to refactor out functionality with this query: “In all files, refactor out all interfaces, types, and schema declarations. Put them all into a single file named Types.ts. Only display Types.ts when done.” The goal here was to reduce the size of the main form component file. It did this but then started to get confused with multiple files.

Here’s to the point of this thread: It seems to regenerate the entire project on each query, and as the project gets more complex, yes, it seems to go through more tokens, and it simply doesn’t complete the display of some files. When asked to only render the last file that it failed to display, it did so with tender apology, but it regenerated the code which was now inconsistent with other files. Then it swapped out the Zod library for Yup. This eventually resulted in this request and response:

Please regenerate this project with the current specs. Refactor code into files that have no more than 80 lines. Return one file at a time followed by a 5 second pause between each one.

! Something went wrong. If this issue persists please contact us through our help center at

From what I’m reading here it seems like the issue is with tokens, and I have no problem buying some if we know that the experience described here is the result of token limits and not capability of the platform.

I know this is all new, free, with known limitations, and I intentionally asked for a rigorous test without going so far as to expect a full project for replacement of human talent. I just want to know what the limits are so that I/we can make better choices about what we ask within specific budgets and within reasonable expectations.


This simply isn’t working for me. In fact, nothing I’ve tried to limit the response has worked for me!

My app is a Q&A bot and after many failed attempts to use fine-tuning, I’ve switched to embeddings to “teach” my bot. I based my method on the one used in this article: Question answering using embeddings-based search | OpenAI Cookbook which means that once I’ve found the data that most closely matches the question, I include that data in the prompt. So my prompt is fairly heavy!

But no matter what I try, I either get a very long completion or one that chops off the middle of a sentence. Can anyone shine any light on what I’m doing wrong please?

TextContent refers to the text between |||contextStart||| and |||contextEnd|||.

Please answer the questions from Bruce using the information in TextContent. Create a response that uses 30 completion_tokens or less. If the answer is not contained within TextContent, apologise to Bruce that you do not know the answer to that question.

The 2020 Summer Olympics[3], officially the Games of the XXXII Olympiad[4] and also known as Tokyo 2020[5], was an international multi-sport event held from 23 July to 8 August 2021 in Tokyo, Japan, with some preliminary events that began on 21 July 2021. Tokyo was selected as the host city during the 125th IOC Session in Buenos Aires, Argentina, on 7 September 2013.[6]

Originally scheduled to take place from 24 July to 9 August 2020, the event was postponed to 2021 on 24 March 2020 due to the global COVID-19 pandemic, the first such instance in the history of the Olympic Games (previous games had been cancelled but not rescheduled).[7] However, the event retained the Tokyo 2020 branding for marketing purposes.[8] It was largely held behind closed doors with no public spectators permitted due to the declaration of a state of emergency in the Greater Tokyo Area in response to the pandemic, the first and only Olympic Games to be held without official spectators.[c] The Games were the most expensive ever, with total spending of over $20 billion.[10]

The Games were the fourth Olympic Games to be held in Japan, following the 1964 Summer Olympics (Tokyo), 1972 Winter Olympics (Sapporo), and 1998 Winter Olympics (Nagano). Tokyo became the first city in Asia to hold the Summer Olympic Games twice.[d] The 2020 Games were the second of three consecutive Olympics to be held in East Asia, following the 2018 Winter Olympics in Pyeongchang, South Korea and preceding the 2022 Winter Olympics in Beijing, China. Due to the one-year postponement, Tokyo 2020 was the first and only Olympic Games to have been held in an odd-numbered year[12] and the first Summer Olympics since 1900 to be held in a non-leap year.

Your name is Penny and you are a sports assistant. You are helpful, creative, clever, and very friendly.

Bruce: Hello, who are you?

Penny: I am Penny, your assistant.

Bruce: Hi Penny, my name is Bruce.

Penny: It’s great to meet you, Bruce, how can I help you today?

Bruce: describe the 2020 olympics

Thanks so much

If you want a short response, try to add this to your system prompt:

Your responses should be a sentence or two, unless the user’s request requires reasoning or long-form outputs.

1 Like

I think this is a YMMV case. For example, I cannot get this prompt to reply using about 1000 characters:

Give me the definition of the word or phrase below.

Word or phrase: """

Write the response in a Funny style for a Professional.

No matter what additional instruction I append to this prompt, the response is still over 1000 characters long. If you succeed, please share what prompt you used.

The AI cannot count or see “characters”. We can do words better though.

Here this crappy AI decided to put it’s “interlocutor” watermarking in (which goes along with “ah!”) as a sign that its answer that is “funny” is a roleplay.

sys = [{“role”: “system”, “content”: “You are ChatMagic, an AI assistant.”}]
user = [{“role”: “user”, “content”: “”"Give me the definition of the word or phrase below.

Word or phrase: “distribution”

Write the response in a funny style for a professional. No output other than the definition text. Target response length 200 words.“”"}]

response, with my automated word counting and streaming “chunks” being close to tokens:

Distribution, my dear interlocutor, is the art of spreading things around like a squirrel hoarding acorns. It is the process of taking a product, service, or even information, and ensuring it reaches its intended recipients in a timely and efficient manner. Picture yourself as a modern-day Santa Claus, but instead of delivering presents down chimneys, you’re delivering goods to eager customers.

In this whimsical dance of distribution, various channels come into play. From the traditional brick-and-mortar stores to the enchanting realm of e-commerce, the goal remains the same: to get the goods from point A to point B, and sometimes even to point C, D, and beyond! It involves logistics, transportation, and a sprinkle of magic to ensure that the right product finds its way into the hands of the right person.

Think of distribution as the conductor of a symphony, orchestrating a harmonious flow of goods through warehouses, trucks, planes, and even drones. It’s a delicate balance of supply and demand, where inventory is managed with the precision of a tightrope walker, ensuring that shelves are stocked and customers are satisfied.

So, my friend, embrace the whimsy of distribution, for it is the backbone of commerce, the invisible force that brings joy to consumers and keeps the wheels of the economy turning.
Done. 213 words/268 chunks in 7.3 seconds.

So my estimation got 1333 characters. Let’s redo and ask for 150 words instead.

Distribution, the art of dispersing goods or services to various destinations, is like a whimsical dance of logistics. Picture a troupe of acrobatic packages, gracefully leaping from warehouses to trucks, and pirouetting their way into the hands of eager customers. This enchanting process involves coordinating the movement, storage, and delivery of products, ensuring they reach their intended recipients with impeccable timing. It’s a symphony of efficiency, where supply chains harmonize with demand, and inventory management becomes a finely tuned melody. From the bustling hub of distribution centers to the rhythmic hum of delivery vehicles, this intricate ballet ensures that goods are available where and when they are needed. So, next time you receive a package at your doorstep, take a moment to appreciate the enchanting choreography behind the scenes, as distribution weaves its magic to bring joy and convenience to our lives.
Done. 142 words/178 chunks in 6.3 seconds.

Counting? 941 characters.

So you can probably live with a 7:1 word specification to character output guess in English, which will stay on task for a few hundred words anyway.

1 Like

Thanks for the revised prompt. I learned two things:

  1. Better to use “words” than “characters”
  2. Better to overshoot, so if you want to reduce the length of a response by 50%, count the total number of words and ask it to rewrite the response using at least 50% fewer words.
1 Like