How to reduce token? It's wasteful to re-enter it every time!

I’m using the transit API to process the text, but it seems that every time a text is automatically entered that needs to be processed, the prompt word is re-entered into the API, which makes the input token large. Is there any way to reduce such unnecessary token consumption while using the latest model scenario? Can anyone help me with this, thanks!

OpenAI has hinted that they’ll offer “memory” of sorts eventually, but for now, you need to send the context every time.

Have you tried simplifying your prompt and / or using a smaller model?

    "role": "system",
    "content": "You are a language model trained by an official phone reviewer. \
    Please write a review about the iPhone from a neutral point of view. Your review should include the following aspects, but not limited to: \
    1. A brief introduction to the overall appearance and feel of the phone, including size, weight, and material texture. \
    2. Evaluation of the screen's display effect, such as brightness, color accuracy, and resolution. \
    3. Evaluation of the phone's performance, including processing speed, gaming experience, and multitasking capabilities. \
    4. Evaluation of the phone's camera system, including photo quality, video quality, and special features (e.g., night mode, 5x zoom). \
    5. Evaluation of the phone's battery life, including charging speed, power consumption under heavy usage, and average battery life. \
    6. Evaluation of the phone's additional features, such as operating system, user interface, and specific features related to the phone's operating system."
    "role": "user",
    "content": "The content is as follows: {text}"

A prompt this size isn’t very big. I think you could over engineer a solution, but it won’t make sense. Plus the LLM business models are racing towards zero. Better and cheaper will most likely be future trends. We’re also seeing larger context windows which will probably change how we solve problems in the future.

1 Like

Other option could be fine-tuning, but I don’t know if that is cost-effective

Won’t that be a feature for ChatGPT only? Also, unless they want to fine-tune models for every model, the “memorized” information will still have to go through the model, so I would be surprised if we were not charged for that.

Prices in general are coming down, though, so we might get lucky. Who knows, though. :wink:

1 Like

Hahahahaha, I know what you mean. But I have tens of thousands of comments that need to be analyzed. Each comment is very short, maybe a third of the total number of cue words. That would make me feel wasted.

Just a thought, what about categorizing the comments first then running just the ones that make sense with the more expensive process. Essentially apply the 80/20 rule.

1 Like

If context is not very important. You can truncate your message specifying max tokens or previous messages passed. Also, you could write a custom function to manage and get more control of your context tokens.

1 Like

To facilitate understanding your understanding, here is a translation of what is being sent to the model in the first image:

{"role": "system", "content": "You are an assistant capable of performing aspect-based sentiment analysis:
You will analyze reviews about iPhone and identify the aspects mentioned in each review, as well as the sentiment polarity for each aspect. The specific requirements are as follows:
First, you will perform sentiment polarity judgment, which can be positive, neutral, or negative. Return the polarity directly;
Second, these reviews often mention multiple aspects at the same time, such as customer experience, brand perception, product quality, and even factors unrelated to the phone. For each aspect, you will judge the sentiment polarity in order, for example: camera (positive), price (negative), appearance (neutral), etc., and return the results in sequence.
Finally, assume you will analyze the following text segment:
"Are you satisfied with the appearance of the phone, the sense of pride brought by the brand's reputation, and the large screen size for phone games (e.g., 5.7-inch screen)". You will return the results as follows:
[appearance (positive), brand reputation (positive), screen size (neutral)] Now start analyzing the following text segment and return the results in the above format:"},
{"role": "user", "content": "The review is as follows: {text}"}

I can reduce the instructions by:

  1. writing in English where less tokens are consumed.
  2. making the instructions straightforward and unmistakable
  3. using GPT-4o if you must have GPT-4, because of its efficient token encoder.

The system text:

You perform automated product review sentiment analysis provided as user, scoring from [ positive, negative, neutral] in such categories as [“customer experience”, “brand perception”, “product quality”, …] where more categories can be added dynamically if they are brought up in the review. The output will summarize the review’s sentiment by keyword. Response will be in Chinese.

// response format example:

It is possible to send multiple reviews in one user input, such as “provide an independent analysis for EACH numbered review”, and then put each in a container such as triple-quotes or brackets, but the quality will decline the more you ask of the AI at once.