MAX TOKENS is 4,096 tokens for gpt-3.5-turbo should fit the the messages sent and the answer generated?

salemmo409 · July 25, 2023, 6:27pm

First question: Does the max tokens have to suffice both messages sent and the answer generated?

Second question: Each one of the array of messages in the "chat’ endpoint will consume a part of the tokens?

Third question: If each one of the array of messages in the "chat’ endpoint consume a part of the tokens and I sent an array of messages that exceed the max tokens what will happen?

Fourth question: How to make the model automatically cuts the part of the beginning of the conversation that exceeds the max tokens?

novaphil · July 25, 2023, 6:33pm

The model’s token limit (documented here) applies to everything, prompt and response. But the max_tokens API parameter only applies to the response.
Yes
You’ll get a error back from API saying you exceeded token limit
By specifying max_tokens in API call you can limit the maximum length of the response, it will abruptly cut off the response at the specified limit. See docs.

salemmo409 · July 25, 2023, 7:08pm

For the fourth question I don’t want to limit the maximum length of the response and cut off the response at the specified limit, instead, I want to cut off the array of messages from it’s beginning, I mean I want the model to take the amount of messages that fit the max tokens and throw the rest. Is this possible? should I do some math and calculate the expected response tokens and calculate how much messages I can send per call to avoid getting error back?

michael23 · July 25, 2023, 7:08pm

The node.js script saves every request and completion to a JSON file. It then sends this data as conversation history with each new prompt. The script also records the number of tokens used. However, the code to calculate the amount of history that can be sent is still in development.

// Import required modules
const fs = require('fs');
const axios = require('axios');

// Your OpenAI API key
const apiKey = 'your-openai-api-key';

// Function to interact with OpenAI API
async function interactWithAI(userPrompt) {
    try {
        // Define the message data structure
        let messageData = { 'messages': [] };

        // If requests.json exists, read and parse the file
        if (fs.existsSync('requests.json')) {
            let raw = fs.readFileSync('requests.json');
            messageData = JSON.parse(raw);
        }

        // Format the conversation history and the new user request
        let systemMessage = "Conversation history:\n" + messageData['messages'].map(m => `${m.role} [${m.timestamp}]: ${m.content}`).join("\n");
        let userMessage = "New request: " + userPrompt;

        // Make a POST request to OpenAI's chat API
        let response = await axios({
            method: 'post',
            url: 'https://api.openai.com/v1/chat/completions',
            headers: { 'Authorization': `Bearer ${apiKey}`, 'Content-Type': 'application/json' },
            data: { 'model': 'gpt-4', 'messages': [ { "role": "system", "content": systemMessage }, { "role": "user", "content": userMessage } ] }
        });

        // Log the AI's response
        console.log(response.data['choices'][0]['message']['content']);

        // Get the current timestamp
        let timestamp = new Date().toISOString();

        // Add the new user request and the AI's response to the message history
        messageData['messages'].push({ 
            "role": "user", 
            "content": userPrompt, 
            "timestamp": timestamp, 
            "tokens": response.data['usage']['prompt_tokens'] // Include prompt tokens
        });

        messageData['messages'].push({ 
            "role": "assistant", 
            "content": response.data['choices'][0]['message']['content'], 
            "timestamp": timestamp, 
            "tokens": response.data['usage']['completion_tokens'] // Include completion tokens
        });

        // Write the updated message history to requests.json
        fs.writeFileSync('requests.json', JSON.stringify(messageData, null, 2));

        // Return the AI's response
        return response.data['choices'][0]['message']['content'];
    } catch (e) {
        // If an error occurred, log it to the console and return an error message
        console.error('An error occurred:', e);
        return 'An error occurred while interacting with the OpenAI API. Please check the console for more details.';
    }
}

novaphil · July 25, 2023, 7:11pm

Ah, yes this is the way to do it. Drop messages as needed to stay within token limits. Alternatively you can make a separate request to ask GPT to summarize the history and use that in place of the actual messages.

salemmo409 · July 25, 2023, 7:20pm

Ok, now if I set the max_tokens parameter, will the model try to hit that max in every response it generate? Or will it’s responses range from one token to max?
I mean, how do I expect the tokens needed by the response?

salemmo409 · July 25, 2023, 7:26pm

Actually I use bubble.io which is nocode platform, so now I trying to figure out how to predict the needed tokens for the response to be able to do math to determine how much history I will include per call

salemmo409 · July 25, 2023, 7:33pm

I don’t know why replies are posted as separate comments, I am posting a reply to a comment and it is posted as a separate comment:
Ok, now if I set the max_tokens parameter, will the model try to hit that max in every response it generate? Or will it’s responses range from one token to max?
I mean, how do I expect the tokens needed by the response?

_j · July 26, 2023, 11:20am

The max_tokens parameter does not inform the AI about the type of output it should generate.

If you don’t want to limit the response output at all, want to potentially use all the available space to generate a response without premature cutoff, and will simply manage the input size so there is enough context length space remaining for that response, you can simply omit the max_tokens optional parameter when using chat completions. (AI can unexpectedly use all if the AI gets caught in a loop, repeating words)

Foxalabs · July 26, 2023, 12:23pm

you can tell the model what size text you would like and it will be loosely taken into account, if you ask for 50 words you might get from 25 to 100 and if you ask for 100 words you might get from 50 to 200. Assuming there is a valid response of approximately that long, asking “is the red ball, red?” will clearly not usually yield a long answer.

Topic		Replies	Views
Can I set max_tokens for chatgpt turbo? API	23	25910	December 13, 2023
Struggling with max_tokens and getting responses within a given limit, please help! API chatgpt	5	13946	October 28, 2023
Setting max tokens for output issues API gpt-4 , api	4	2696	January 26, 2024
Max_tokens seems to do nothing for me 3.5 Turbo API	14	3121	December 18, 2023
Question regarding max_tokens Prompting	11	35350	December 13, 2023

MAX TOKENS is 4,096 tokens for gpt-3.5-turbo should fit the the messages sent and the answer generated?

Related topics