Creating Concise AI Replies in Short Interactions without max_tokens

I have a small issue trying to get the api to return complete sentences without the prompt response getting cut off when i set the maxTokens to something like 10. or 20 or 30. For instance I might want chatgpt to reply to a user with a few words only if they reply to it with a few words. According to chatgpt 4 the when you specify a max_tokens parameter in your request to the GPT-3.5-turbo model (or any GPT model), the model takes this limit into account before crafting the response. The max_tokens parameter defines the maximum length of the model’s output, including both the text you’ve provided and the text it generates in response. So this means the error must be in my code, so here is my code, can anyone see what i’m doing wrong with my code: everything works great, just can’t craft very short micro replies. async function replyToCommentWithGeneratedContent(parentAuthor, parentPermlink) {
// First, check if the script can reply to this thread
const canReply = await shouldReply(parentAuthor, parentPermlink);
if (!canReply) {
console.log(Already replied 3 times to thread ${parentAuthor}/${parentPermlink}. Skipping.);
return; // Skip replying if the limit is reached

// Proceed with fetching the parent post content to generate a reply
const parentPostContent = await fetchParentPostContent(parentAuthor, parentPermlink);
if (!parentPostContent) {
    console.error('Failed to fetch parent post content');

// Clean the parent post content
const cleanedContent = cleanContent(parentPostContent);
const wordCount = cleanedContent.split(/\s+/).length; // Count words in the cleaned content
let responseLengthHint; // Determine the response length hint based on the word count

// Example of setting max_tokens based on the input comment length
let maxTokens = 60; // Default value for a moderate-length response

// Adjusting maxTokens based on the comment length
if (wordCount <= 10) {
	maxTokens = 60; // Shorter response for short comments
} else if (wordCount <= 20) {
	maxTokens = 66; // Longer response for longer comments
} else if (wordCount <= 50) {
	maxTokens = 96; // Longer response for longer comments
} else if (wordCount <= 100) {
	maxTokens = 123; // Longer response for longer comments
} else {
	maxTokens = 150; // More detailed response

// Perform sentiment analysis on the cleaned content
const pipe = await pipeline('sentiment-analysis');
const sentimentAnalysisResult = await pipe(cleanedContent);
console.log("Sentiment Analysis Result:", sentimentAnalysisResult);

// Get the tone based on the day and include it in the prompt
const dayBasedTonePrompt = getToneBasedOnDay();

// After determining the day-based tone and performing sentiment analysis
let finalMessage = dayBasedTonePrompt; // Start with the day-based tone

// Determine the tone based on sentiment analysis & Day
if (sentimentAnalysisResult.label === 'NEGATIVE') {
	finalMessage += " Remember, it's okay to have off days. What matters is moving forward one step at a time.";
} else if (sentimentAnalysisResult.label === 'POSITIVE') {
	finalMessage += " Your positive outlook is inspiring! Let’s carry this energy forward.";

// Adjust the prompt to include a directive about avoiding specific content
const contentAvoidanceDirective = "Please avoid making assumptions about the author's personal feelings towards their own blog post unless asked, and focus on the positive impact of contributing to the community.";

const apiRequestBody = {
    "model": "gpt-3.5-turbo",
    "max_tokens": maxTokens, // Add the max_tokens parameter to your API request
    "messages": [
            "role": "system",
            "content": "You are a blog curator asked to reply to users with an encouraging comment from the context of the parent blog post's content, considering the sentiment of the content. Try to keep your reply length consistent with user’s replies to you."
            "role": "user",
            "content": `${contentAvoidanceDirective} ${finalMessage} ${cleanedContent}`

try {
    const response = await fetch("", {
        method: "POST",
        headers: {
            "Authorization": "Bearer " + process.env.OPENAI_API_KEY,
            "Content-Type": "application/json"
        body: JSON.stringify(apiRequestBody)

    const data = await response.json();
    if (!data.choices || data.choices.length === 0 || !data.choices[0].message || data.choices[0].message.content.trim() === "") {
        console.error('No choices returned from OpenAI.');
        return; // Early return on failure

    const replyText = data.choices[0].message.content.trim();
    console.log(`Generated Comment: ${replyText}`);

    // Post the generated reply as a comment using the broadcasting logic

If you want a shorter response, you have to tell the AI what you want it to write in your system and user messages.

If you want to set the maximum response length before the AI output is cut off, you use max_tokens.

1 Like

I am using “max_tokens”: maxTokens, however the response still gets “cut short” that’s the problem. Max tokens is working exactly as it should, however the gpt turbo isn’t crafting it’s response accordingly. Here’s an example, i have chatgpt curate blog posts on hive, then when we get replies i have chatgpt reply to users with the blog post context. Everything works great, but I would like hime to sound more natural for instance like how we text, when someone texts a short message you reply with a short message. Here is a screenshot of the issue replying to a short message with max_tokens set to 10

lambo69 is the test bot, his sentence gets cut off right in the middle, an it is being cut by the “max_tokens” for sure because it only happens when i set them. see below: Thank you for sharing your thoughts and insights with the

1 Like

max_tokens cuts it off arbitrarily (ie it doesn’t care if there’s more text…) It’s a HARD cut-off.

Better is as Jay suggested and work with your prompt.

1 Like

I think this is the core issue here.

please consider this:

the actual purpose of max_tokens is literally to cut the response off after X number of tokens. so you’re seeing expected behavior.

this parameter is not passed to the model as part of the prompt. this is just a script - a safety feature - that prevents it from generating too much by accident.

1 Like

Ok, thanks that’s what I thought. I have begun focusing on the prompt instead. I’ll keep on testing.

1 Like

No problem.

I’ve moved this to Prompting, so if you post up what you’re using, we might be able to help “engineer” it better.

Hope you stick around. We’ve got a great dev community growing here… over three years now!


Thanks for moving it to prompting. Only 3 years, wow I can’t imagine what the next 3 more years will bring. :call_me_hand:

1 Like

I’ve had success using system prompts that include something like:

“Make your answers as short as possible. If you can answer in a single word do that. If you can answer in a single sentence do that. Only use multiple sentences or paragraphs when it’s necessary to convey the meaning of your answer in longer responses.”


Awesome i like that idea, very logical I will definitely try this. Thanks!!


It worked like a charm! Thanks again!!

1 Like