[Python] Need coding help. NPC conversation history can't pass 800 tokens

Fallendroes · April 7, 2024, 2:17pm

Hello, fairly new to chatGPT API, I have been working on chatGPT to have an authentic NPC conversation in my game. The code is in Python and is supposed to remember what we talked about before.

It works… until the total token count hits 800~, then if I don’t remove the previous messages, AI resets fully, and doesn’t remember anything. Which is like 2~3 messages.

The function block looks like this, around the last lines, if I don’t set “max_tokens_limit” to 800. the conversation keeps resetting.
The “gpt-3.5-turbo-16k” is supposed to have a ‘16385’ token limit, however, the conversation is resetting every 2~3 prompts, around where it reaches 800~ tokens used.

Does anyone know what I am doing wrong??

client = OpenAI(api_key="sk--------------")

messages = [
    {"role": "system", "content": systemRoleMssg},
    {"role": "user", "content": userInitilizingMssg}
]

max_tokens_limit = 16385

def sendMssgToChatGPT(text_MSG):##insert prompt here
    # Initialize messages list if it doesn't exist
    if "messages" not in globals():
        global messages
        messages = []
    
    # Append user message
    messages.append({"role": "user", "content": text_MSG})

    # Generate completion
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo-16k",
        messages=messages,
        max_tokens=300
    
    )
    
    # Get model response
    model_response = completion.choices[0].message.content
    
    # NPC or chatGPT is saying this: 
    print(model_response)

    # Append assistant response
    messages.append({"role": "assistant", "content": model_response})

    # Calculate total tokens
    total_tokens = sum(len(message["content"].split()) if "content" in message else 0 for message in messages)

    # Remove older messages if the token limit is reached
    while total_tokens > (max_tokens_limit): ##If I don't change max_tokens_limit = 800, chatGPT refreshes after hitting 800, although tokens keep increasing
        removed_message = messages.pop(1)
        total_tokens -= len(removed_message["content"].split()) if "content" in removed_message else 0

ClipFarms · April 7, 2024, 5:14pm

Yes the problem is clear

You’re counting total tokens via len, but a word can have more than one token, and in general, len is just a really rough/inaccurate way of counting tokens, great for quick napkin math, but bad for precision

Try this:

from tiktoken import Tokenizer

tokenizer = Tokenizer()

def count_tokens(text):
    return len(list(tokenizer.tokenize(text)))

def sendMssgToChatGPT(text_MSG):
    # Initialize messages list if it doesn't exist
    if "messages" not in globals():
        global messages
        messages = []
    
    # Append user message
    messages.append({"role": "user", "content": text_MSG})

    # Generate completion
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo-16k",
        messages=messages,
        max_tokens=300
    )
    
    # Get model response
    model_response = completion.choices[0].message.content
    
    # NPC or chatGPT is saying this: 
    print(model_response)

    # Append assistant response
    messages.append({"role": "assistant", "content": model_response})

    # Calculate total tokens
    total_tokens = sum(count_tokens(message["content"]) if "content" in message else 0 for message in messages)

    # Remove older messages if the token limit is reached
    while total_tokens > (max_tokens_limit):
        removed_message = messages.pop(0)
        total_tokens -= count_tokens(removed_message["content"]) if "content" in removed_message else 0

ClipFarms · April 7, 2024, 5:14pm

Then you may need to play around again with the max tokens to get outputs and token consumptions that you prefer. Also, I love this idea.

Fallendroes · April 7, 2024, 10:21pm

That seems to work, I am not convinced the “slightly” wrong count of tokens was the problem but so far, it is working.
Thanks a bunch!

ClipFarms · April 8, 2024, 7:10am

Np, and also I should have done two things:

explained further that counting via len split is indeed very inaccurate:

“Claire’s dog ran down the street towards the intersection, bit the mailman’s butt, stole his mail, and returned to Claire’s house”

len split of this = 21 words

tokenizer = roughly 30 tokens

So just this simple example shows you how far off using len split is compared to tokenizer (which is not precise but invariably very very accurate). Tokenizer counts about 50% more tokens than len split in that example. So it is a bigger difference than you might realize, but you know your project better than I do

noticed that my Claude chatbot did create a material difference I didn’t see. If token count is exceeded, your script pops the second message from the index (position 1), while mine pops the first message (position 0).

Personally I deal with this by removing the first PAIR of messages if desired token count is exceeded. But you’re using smaller context windows than I do for all my apps. So maybe that doesn’t make sense. It’s really something you would have to test out, but if you say it works as is, then that’s great

You could also loop messages through a 3.5 summarizing function to “truncate” the messages, starting from the first user message and looping straight through the rest as needed, until proper token window is hit. 3.5 is fast and cheap so maybe a good idea, so long as (1) you prime the model by stating it’s a literal summary, as without that primer, the model may take stylistic writing cues from the summaries, and (2) if your game can handle the time delay. You could also batch any cut messages into one batch and summarize that way, which would probably be more efficient.

As an added third fuckup I’m just seeing, my script didn’t add the system message or the first user primer userInitilizingMssg (which is probably why Claude popped index 0 instead of index 1).

So your script was actually popping the first user init message, if it exists, and if it doesn’t, then (I believe) it was popping the first assistant message. (FYI in a list of [“cat”, “dog”, “hotdog”], index position 1 is “dog” and not “cat”)

Try this and see if you get any further improvement:

from tiktoken import Tokenizer

tokenizer = Tokenizer()
messages = [
    {"role": "system", "content": systemRoleMssg},
    {"role": "user", "content": userInitilizingMssg}
]

def count_tokens(text):
    return len(list(tokenizer.tokenize(text)))

def sendMssgToChatGPT(text_MSG):
    global messages
    
    # Append user message
    messages.append({"role": "user", "content": text_MSG})

    # Generate completion
    completion = client.chat.completions.create(
        model="gpt-3.5-turbo-16k",
        messages=messages,
        max_tokens=300
    )
    
    # Get model response
    model_response = completion.choices[0].message.content
    
    # NPC or chatGPT is saying this: 
    print(model_response)

    # Append assistant response
    messages.append({"role": "assistant", "content": model_response})

    # Calculate total tokens
    total_tokens = sum(count_tokens(message["content"]) if "content" in message else 0 for message in messages)

    # Remove older messages if the token limit is reached
    while total_tokens > max_tokens_limit:
        if len(messages) <= 2:
            break
        removed_message = messages.pop(2)
        total_tokens -= count_tokens(removed_message["content"]) if "content" in removed_message else 0

Fallendroes · April 8, 2024, 11:43am

Wow, I didn’t realize the difference was that high, 50% is quite huge when I am hitting 300+ token messages constantly.

That is actually genius. I never would have thought of that. I haven’t used “truncate” property before so I will have to check that out first. But to make the AI summarize what happened so it doesn’t get overwhelmed by small details is really smart. Although I can see AI starting to miss the point eventually for non stop summarizing. Will have to test like you see.

Yeah, I updated my script multiple times after we talked, pop(2) is what I am using too, pop(0) and (1) actually remove important information, not the chat history I want. I didn’t know how pop() worked that well before, but now I do. I also have a “history text file” I record so NPCs remember events between save’s. Button to remove and reroll last prompt/response.

Can’t wait to see what else I can do with chatGPT API, it is very cool. Messing around with imagine generation and text to speech atm. I follow the official documentation as a guide and ask chatGPT itself how to code these. And now, here on the forum, I am starting to ask and read.

Thanks again

Topic		Replies	Views
Need more than a 4097 token call from chat gpt api API	7	3257	November 28, 2023
Exceeding token limit while maintaining context Bugs gpt-35-turbo , api	3	1429	May 5, 2024
Short-Term Memory in Season Solutions? Prompting gpt-4 , memory-issues	9	1080	December 21, 2023
I wish that when using the GPT API, it would be possible to have a contextual conversation like chatGPT API	14	7172	December 18, 2023
Getting ChatGPT to Remember Previous Chat Messages Prompting	37	69482	January 29, 2024

[Python] Need coding help. NPC conversation history can't pass 800 tokens

Related topics