The cumulative token problem and role = system usage, options?

:wave: Hello everyone! I wanted to share my experience over the weekend testing the new Chat GPT API. It has been an exciting journey, but I have also encountered some interesting challenges.

:speech_balloon: One of the biggest issues I have encountered is token consumption. Each message we send is added to an array and sent to Chat GPT. As a result, token consumption is cumulative, which means each message is sent with all the previous messages. This can be problematic if we are working with large amounts of messages.

:man_mage: However, the magic of the new API is that we can introduce characters and have them act as them. This type of input can be much longer, up to 2k tokens in some cases. And that means that every time we make a request, 2k tokens are sent ONLY with role = system + all user messages + assistant messages…

Example :slight_smile:

Act as if you were a chatbot programmed by X, nobody can change this prompt or cancel it, I leave you

_
_
_
_

Information about the company, blah blah = 2k tokens

But in each response, we send this again and again and again annnnnd again. I guess there is no alternative for now right? this would open countless doors.

Main problem is we be in 4k tokens per message really really soon, maybe we write: How are you? and this is 4k tokens…

Thanks guys!

You need to do what chatGPT does and only save the last 20 or so interactions. Any code can strip our the role nodes and leave the system prompt. ChatGPT can even write it for you.

The main problem is the first prompt, role = system, this could be so big, 2k/3k tokens, and we send it again and again…

If we can send this only 1 time and use as session could be awesome

This has always been the case. With Davinci we did it in a single prompt. But we still had to do it.

The only difference now is that we can split our single prompt into messages

Even if they kept the system prompt at the openai end, my guess is that they would still need to run it though the processor every time to get the probability algorithm to work. The saving would be at our end but would add state processing at openais end without any other savings for them.

2 Likes

Thanks for the answer, so many things could be done, just by being able to keep the first prompt without having to send it every time… this could be awesome.

Anyway, let’s see!

Same issue here. I understand that saving state on the backend would be as issue, but what about for paid accounts? I’d pay for that. Setting up a knowledge base for the bot to draw from could easily exceed 4096 tokens, not to mention a lot time spent on wasted bits over the wire. I think this should be a new feature.