Chat GPT Feedback for Polite Messages

Token-Efficient Handling of Non-Instructional Prompts (e.g., “Thanks”, “Welcome”)

Dear OpenAI Team,

As a regular user of ChatGPT, I appreciate the conversational fluidity and human-like interaction. However, I recently came across comments from Sam Altman regarding the significant compute cost associated with processing even minimal prompts such as “thank you” or “you’re welcome.” This inspired a technical suggestion that could help optimize model efficiency:

Problem:

Non-instructional inputs like polite phrases—e.g., “okay”, “thanks”, “welcome”, “good night”—are low in semantic complexity but are currently processed by the full LLM pipeline, incurring unnecessary compute cost and latency.

Suggestion:

Introduce a pre-processing layer or lightweight fallback model to intercept and handle these cases without invoking the full LLM. Possible implementation strategies include:

  1. Regex + Intent Filter Layer: Pre-screen input tokens for low-complexity, low-context prompts and return static or cached responses.

  2. Micro-model or Classifier: Use a small classifier to route non-instructional queries to a lightweight model or template response.

  3. User-level Control: Offer an “Efficiency Mode” or “Minimal Response Mode” where users can opt to skip or shorten responses to casual inputs.

  4. Token-Aware Rate Limiting: When user messages fall below a semantic complexity threshold, apply a throttled or no-response policy if configured.

Benefit:

This could reduce cumulative token consumption significantly, especially at ChatGPT’s scale, while preserving user experience for those who prefer concise replies.

Looking forward to hearing your thoughts or improvements on this idea.

Best regards,
Deepak Goyal
Active ChatGPT Plus user

These ‘advertising’ articles didn’t really go into much depth about customisation…

With customisation ChatGPT (API aside of course) can return highly targeted short and snappy responses…

You can also write custom macros and save them to memory to greatly reduce both input and output.

eg.

Create a macro ‘Translate()’ to translate the text I post to English and return ONLY that translation and save it to your memory…

Translate()

Text here

This can drastically cut both input and output tokens when used well.