[Paper] LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

anon22939549 · December 14, 2023, 6:42pm

Paper out of Microsoft last week on prompt compression.

Abstract

Large language models (LLMs) have been applied in various applications due to their astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) prompting and in-context learning (ICL), the prompts fed to LLMs are becoming increasingly lengthy, even exceeding tens of thousands of tokens. To accelerate model inference and reduce cost, this paper presents LLMLingua, a coarse-to-fine prompt compression method that involves a budget controller to maintain semantic integrity under high compression ratios, a token-level iterative compression algorithm to better model the interdependence between compressed contents, and an instruction tuning based method for distribution alignment between language models. We conduct experiments and analysis over four datasets from different scenarios, i.e., GSM8K, BBH, ShareGPT, and Arxiv-March23; showing that the proposed approach yields state-of-the-art performance and allows for up to 20x compression with little performance loss. Our code is available at this https URL.

Links

Paper: [2310.05736] LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Blog Post: LLMLingua: Innovating LLM efficiency with prompt compression - Microsoft Research
GitHub Repo: LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression

moonlockwood · December 14, 2023, 8:06pm

That is a thing of beauty

Topic		Replies	Views
[Paper] Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding Prompting research	0	2488	January 28, 2024
[Paper] Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers Prompting research	1	1383	December 14, 2023
LMQL - new paradigm for prompting Prompting gpt-4 , prompts-as-code	1	999	September 14, 2023
Tree of Thoughts — prompting method that outperforms other methods Prompting gpt-4 , chatgpt , api , prompt	6	14593	December 14, 2023
Algorithm of Thoughts [New Prompting Strategy] Prompting prompt-engineering	1	2327	December 14, 2023

[Paper] LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Abstract

Links

Related topics