The cumulative token problem and role = system usage, options?

:wave: Hello everyone! I wanted to share my experience over the weekend testing the new Chat GPT API. It has been an exciting journey, but I have also encountered some interesting challenges.

:speech_balloon: One of the biggest issues I have encountered is token consumption. Each message we send is added to an array and sent to Chat GPT. As a result, token consumption is cumulative, which means each message is sent with all the previous messages. This can be problematic if we are working with large amounts of messages.

:man_mage: However, the magic of the new API is that we can introduce characters and have them act as them. This type of input can be much longer, up to 2k tokens in some cases. And that means that every time we make a request, 2k tokens are sent ONLY with role = system + all user messages + assistant messages…

Example :slight_smile:

Act as if you were a chatbot programmed by X, nobody can change this prompt or cancel it, I leave you

_
_
_
_

Information about the company, blah blah = 2k tokens

But in each response, we send this again and again and again annnnnd again. I guess there is no alternative for now right? this would open countless doors.

Main problem is we be in 4k tokens per message really really soon, maybe we write: How are you? and this is 4k tokens…

Thanks guys!

1 Like

You need to do what chatGPT does and only save the last 20 or so interactions. Any code can strip our the role nodes and leave the system prompt. ChatGPT can even write it for you.

The main problem is the first prompt, role = system, this could be so big, 2k/3k tokens, and we send it again and again…

If we can send this only 1 time and use as session could be awesome

This has always been the case. With Davinci we did it in a single prompt. But we still had to do it.

The only difference now is that we can split our single prompt into messages

Even if they kept the system prompt at the openai end, my guess is that they would still need to run it though the processor every time to get the probability algorithm to work. The saving would be at our end but would add state processing at openais end without any other savings for them.

2 Likes

Thanks for the answer, so many things could be done, just by being able to keep the first prompt without having to send it every time… this could be awesome.

Anyway, let’s see!

Same issue here. I understand that saving state on the backend would be as issue, but what about for paid accounts? I’d pay for that. Setting up a knowledge base for the bot to draw from could easily exceed 4096 tokens, not to mention a lot time spent on wasted bits over the wire. I think this should be a new feature.

Dudes, use GPT to summarize it.

Once you up to >3000 tokens, have GPT condense it to 500.

rinse and repeat.

I am a bit late but I there are 2 things here.
Messages length.
I cant think of a use case where you would need to keep all the messages to keep the context . in my experience,10 messages ( 5 user, 5 assistant) is enough for the AI to keep track of what is being talked about . A good practise is to experiment with different lenghts and see how it responds and only keep the minimum number needed for yoour use case.
Token Usage:
The obvious answer is to summarise the system role content, Ensure to add the system role just before the user prompt. For example after every prompt and response you remove the system role only to add it back just before the next cycle. Usually when the system role is far back in the messages array it loses potency

If your system is complex enough and your are making enough calls it could be worth the investment to fine-tune a sub-model like ada.

To my understanding you would have to gather the prompts and responses from your current AI (the one making the request with the initial system prompt every time) and use those prompt/response pairs to fine tune your model, baking the system prompt into it .

The initial investment would be higher but assuming the API pricing structure doesn’t change it would save you a lot in the long run.