Is there a way to set a character limit (not a token limit) to OpenAI's Responses API?

I know about the max_output_tokens field, but that’s for tokens. I want to set a character limit. How do I go about doing that?

Short answer: you can’t.

But, you can ask in your prompt to generate such output. Notice though, that if your goal is to save tokens, the reasoning required for it might undermine the attempt by using more tokens to deduce an answer that fits the request. Like:
answer with a text output containing at most 100 characters.
Also, a non reasoning model might completely ignore your request.

As an alternative, you can use a structured output with a clear definition of constraints:

from pydantic import BaseModel, Field
class JsonResponse(BaseModel):
    response: str = Field(min_length=10, max_length=19)

response = client.responses.parse(
    model="gpt-4.1-mini",
    input="a random fruit name",
    temperature=1.5,
    text_format=JsonResponse
)
print(response.usage)
response.output_parsed.response
1 Like

The closest you will get to some control, a “limit” by API parameters:

Use “reasoning”: “minimal”

More developer prompting to shut off reasoning on gpt-5 internal methods.

Allow for an unseen 10 tokens or so for internal sending to channels.

Then max_output_tokens you specify will be closer to the cutoff point you want - in tokens, and not terminate the output while the AI model is still generating in the pre-reasoning before you ever receive anything.

If you absolutely want to truncate the output by characters, you can do that on what you receive, and close a streaming connection to not pay too much more than that.


Otherwise, you are left talking to the AI model and its poor idea of planning when turning your request for a character count into a response length.

2 Likes

My goal isn’t to save tokens, but rather because I’m using the whatsapp messaging API which has a hard limit of 4096 characters.

I’ve heard this idea that you can try to set an approximate limit by setting the token limit to 4096 / 4 = 1024 since a token is approximately equal to 4 characters, but I’m honestly not sure how that works. Any insight on whether such a fix would be appropriate?

For reasoning models like gpt-5 the max_output_tokens parameter doesn’t work because reasoning is part of output tokens. Also, the math is not that simple like a ratio for characters.

But since it can reason, you might set a developer role message that state you want a conditioned output, like:
Your final answer must not exceed 50 characters.
(this one won’t work well on non-reasoning models though)

Also, there is the structured output I already mentioned.

Have you tried these?

2 Likes

I’m using gpt-4.1 mostly because I’ve heard that gpt-5 was disappointing. I take it since 4.1 is not a reasoning model like the o-models that it won’t be able to properly enforce that limit?

I’m not sure I understand your structured output approach. Can I be sure that it won’t truncate the output at an awkward point? I don’t want to put a hard limit in such a way that it doesn’t complete its output.

Well, the model will try its best to fit the constraints defined.

Or, you can dynamically adjust max_output_tokens if you exceed the desired character count, but you will end up with either too short or too many retries.

Anyway, it is a matter of trial and experimenting. Let me know if you have any problems running the example I showed.

The AI doesn’t receive your API token maximum budget parameter to alter the type of response it produces. It only receives your language.

Additionally, because it is post-trained on a particular style of responses, and generates token-by-token, the AI cannot plan out how to arrive at a particular length and consume or limit the output. It must try, and then be corrected, because it cannot even count by words or characters due to tokenization.

For a maximum of 4k characters, only longer production of articles or other artifacts are going to hit that on GPT-4.1. It just doesn’t like to output much more than 1500-1800 tokens, especially if the output it is writing is already supposed to be placed within a JSON because it is sending structured output to an API.


You can “take over” the chat and iterate with iterative correction if the output is not useful. To demonstrate, I tell the AI it is communicating by Whatsapp, but don’t give it any initial guidance of any length rules.

Try 1 (token cutoff reached)

Try 2 (token cutoff reached)

I correct the AI with an “automatic” message:

Try 3 (under the “maximum characters”)

250 characters “deliverered”

Learned length after


Instructing AI

Were it not as high has 1000 tokens being your limit, beyond simply “keep your answers shorter than normal, under 500 words. No yapping.” you could also put a few examples of response length is a system message for the AI to understand.