How to reduce token? It's wasteful to re-enter it every time!

sinoftj · May 28, 2024, 3:58pm

I’m using the transit API to process the text, but it seems that every time a text is automatically entered that needs to be processed, the prompt word is re-entered into the API, which makes the input token large. Is there any way to reduce such unnecessary token consumption while using the latest model scenario? Can anyone help me with this, thanks!

PaulBellow · May 28, 2024, 6:26pm

OpenAI has hinted that they’ll offer “memory” of sorts eventually, but for now, you need to send the context every time.

Have you tried simplifying your prompt and / or using a smaller model?

brian18 · May 28, 2024, 6:41pm

[
  {
    "role": "system",
    "content": "You are a language model trained by an official phone reviewer. \
    Please write a review about the iPhone from a neutral point of view. Your review should include the following aspects, but not limited to: \
    1. A brief introduction to the overall appearance and feel of the phone, including size, weight, and material texture. \
    2. Evaluation of the screen's display effect, such as brightness, color accuracy, and resolution. \
    3. Evaluation of the phone's performance, including processing speed, gaming experience, and multitasking capabilities. \
    4. Evaluation of the phone's camera system, including photo quality, video quality, and special features (e.g., night mode, 5x zoom). \
    5. Evaluation of the phone's battery life, including charging speed, power consumption under heavy usage, and average battery life. \
    6. Evaluation of the phone's additional features, such as operating system, user interface, and specific features related to the phone's operating system."
  },
  {
    "role": "user",
    "content": "The content is as follows: {text}"
  }
]

A prompt this size isn’t very big. I think you could over engineer a solution, but it won’t make sense. Plus the LLM business models are racing towards zero. Better and cheaper will most likely be future trends. We’re also seeing larger context windows which will probably change how we solve problems in the future.

sashirestela · May 28, 2024, 6:56pm

Other option could be fine-tuning, but I don’t know if that is cost-effective

LinqLover · May 29, 2024, 1:10am

Won’t that be a feature for ChatGPT only? Also, unless they want to fine-tune models for every model, the “memorized” information will still have to go through the model, so I would be surprised if we were not charged for that.

PaulBellow · May 29, 2024, 1:42am

Prices in general are coming down, though, so we might get lucky. Who knows, though.

sinoftj · May 29, 2024, 3:37am

Hahahahaha, I know what you mean. But I have tens of thousands of comments that need to be analyzed. Each comment is very short, maybe a third of the total number of cue words. That would make me feel wasted.

brian18 · May 29, 2024, 3:54pm

Just a thought, what about categorizing the comments first then running just the ones that make sense with the more expensive process. Essentially apply the 80/20 rule.

Munna23 · May 29, 2024, 5:39pm

If context is not very important. You can truncate your message specifying max tokens or previous messages passed. Also, you could write a custom function to manage and get more control of your context tokens.

https://platform.openai.com/docs/assistants/how-it-works/context-window-management

_j · May 29, 2024, 6:11pm

To facilitate understanding your understanding, here is a translation of what is being sent to the model in the first image:

[
{"role": "system", "content": "You are an assistant capable of performing aspect-based sentiment analysis:
You will analyze reviews about iPhone and identify the aspects mentioned in each review, as well as the sentiment polarity for each aspect. The specific requirements are as follows:
First, you will perform sentiment polarity judgment, which can be positive, neutral, or negative. Return the polarity directly;
Second, these reviews often mention multiple aspects at the same time, such as customer experience, brand perception, product quality, and even factors unrelated to the phone. For each aspect, you will judge the sentiment polarity in order, for example: camera (positive), price (negative), appearance (neutral), etc., and return the results in sequence.
Finally, assume you will analyze the following text segment:
"Are you satisfied with the appearance of the phone, the sense of pride brought by the brand's reputation, and the large screen size for phone games (e.g., 5.7-inch screen)". You will return the results as follows:
[appearance (positive), brand reputation (positive), screen size (neutral)] Now start analyzing the following text segment and return the results in the above format:"},
{"role": "user", "content": "The review is as follows: {text}"}

I can reduce the instructions by:

writing in English where less tokens are consumed.
making the instructions straightforward and unmistakable
using GPT-4o if you must have GPT-4, because of its efficient token encoder.

The system text:

You perform automated product review sentiment analysis provided as user, scoring from [ positive, negative, neutral] in such categories as [“customer experience”, “brand perception”, “product quality”, …] where more categories can be added dynamically if they are brought up in the review. The output will summarize the review’s sentiment by keyword. Response will be in Chinese.

// response format example:
人体工程学（积极），质量（中立），价格（消极），外观（积极），…

It is possible to send multiple reviews in one user input, such as “provide an independent analysis for EACH numbered review”, and then put each in a container such as triple-quotes or brackets, but the quality will decline the more you ask of the AI at once.

Topic		Replies	Views
How can I write Python code such that both input prompt and output results are in the same conversation thread? API gpt-35-turbo , api	4	786	April 26, 2024
The cumulative token problem and role = system usage, options? API	9	4067	February 16, 2024
Long Prompt with Large Text Data Prompting gpt-35-turbo , chatgpt , api	3	12200	July 14, 2023
How to optimize API request in terms of expenses API	8	2003	December 17, 2023
How to improvement my app to use less tokens Community gpt-4 , api	4	7638	July 8, 2024

How to reduce token? It's wasteful to re-enter it every time!

Related topics