Are abbreviations a solution to minimize tokens?

I am trying to find efficient ways to minimize the number of tokens in the context stages to maximize results and be able to send a larger number of parameters, with all the benefits that this implies. Regarding this, I came up with an idea that might have already been addressed in some thread, but honestly, I couldn’t find it in the forum. It is to abbreviate words to their simplest form, as I understand both GPT-3.5 and GPT-4.0 understand abbreviated text and can return texts without abbreviations even if the input was abbreviated. So I set out first to test how much a text can be abbreviated. With the GPT-3 encoder tool in Node (here’s a very useful graphical implementation as well:, it obviously depends on the specific text, but the token savings were about 20% to 35% (I clarify that it was with the texts I provided), which seemed significant since this is strictly reflected in its use, especially in context files.

Then for the tests, I used these hierarchical and schematized files from a main file. For example, main file: Human resources in administration.pdf. With GPT-4.0, I hierarchized and schematized this file to the most relevant, then I summarized each schematization with GPT-4.0. After this, I abbreviated each summary with GPT-4.0. Once I did all this, I assigned a title to each file that referenced the topic addressed in the file, for example:
1- performance management, Robbins and Coulter, Idalberto Chiavenato, performance evaluation, evaluation method.txt
2- human resource management.txt
3- staff orientation, types of training, training methods.txt
4- human resource planning process, Robbins and Coulter, job specification, recruitment.txt

Subsequently, we instructed the model as follows: “You are an expert professor in answering questions, you are provided with files from which you must derive your answers, they should be without abbreviations, references, and notes.” This way, we indicated what to do with those files.

This scheme has given me very good results, although I am still testing it to try to achieve the most accurate results possible. But I wanted to propose it because this approach of leveraging the model’s ability to understand abbreviations and using it to our advantage in reducing tokens seems interesting to me.

I’m new to this world and I really appreciate your comments, as there is surely a better or more abstract way to do this. Thank you very much in advance!

This isn’t a long term solution.

  1. It’s uncertain what the impact will be on the quality of responses.
  2. It’s unlikely to have as dramatic of an impact as you’re hoping.

If you are that cost-sensitive, better solutions would be to,

  1. Streamline your process and be very aggressive with maximizing the information density of your messages to and from the model.
  2. Use a cheaper model to act as a translation layer between users and a more expensive model by condensing messages to be more informationally dense.
  3. Just use a cheaper model.
1 Like

Thank you very much for your response. Although cost does play a role in all this, and it’s always better to optimize when possible, I see it more as the fact that when the context becomes very extensive, despite having the files already schematized and summarized, there are occasions when this situation has been a limitation. However, this approach of using two intertwined models seems very interesting to me. I will continue working to see what results I can achieve with that!

If you consider how vector embeddings work in semantic space (which is the principle that governs all word meanings in LLMs) it makes sense that some sort of abbreviated word is likely to encode to tokens that point to a good enough location in semantic space to yield relevant results, but each time you strip out tokens (which is what abbreviatons means) you are removing valuable contextual information and damaging query accuracy, by just adding intentional error/fuzziness. So I would personally avoid doing abbreviations just for the sake of shortening data.

1 Like

Thank you very much for your comments @wclayf. I hadn’t considered the analysis from that point of view, which is completely true. Ultimately, “tokens are saved,” but quality is also lost, and that’s exactly what I don’t want to lose in this case. So, I will try to do what was suggested @elmstedt, using two models simultaneously, with one interacting with the client and the other establishing the condensed bases to achieve the highest possible precision, which is ultimately what I’m looking for.

Actually after reading your post again, I realize you didn’t mean abbreviations in the normal sense of the word which would be…well…word abbreviations. I think what you meant was what I would have called “summaries”. You’re taking text and just making it “shorter” (i.e less words) I think.

I almost asked for an example of an “abbreviation”, but assumed you meant single words. Oddly enough most of what I said probably still applies, just at the sentence level not at the word level. Anyway glad if that helped you think it thru some.

1 Like

In your first interpretation, you understood correctly. I was specifically referring to “abbreviations of words,” in addition to structuring ideas and summarizing content to its core idea. Let me explain better: often, a large number of words can be used in a sentence, when the main idea of all that might be summarized in one line. Now, I wonder, following your last comment, if this technique of structuring content (starting from a large file, subdividing it into small parts, and summarizing them individually) and summarizing it can be useful in a robust solution. I conducted a large number of tests and obtained good results, but I am sure I still have a long way to go. Thank you very much for your time!