Am I doing something wrong or is the pricing extremely steep?


I’m doing some tests using MacGPT, and some 10 to 20 words prompts are counted 700-1K tokens. The pricing page convinced me with ~750 words = 1K tokens, but right now it seems it’s really not worth it. Last example, prompt 51 words, completion 227 words (in French). Total 278 words for 1,057 prompt + 419 completion = 1,476 tokens.

I made another test (asking for fun the AI about the last prompt cost), 16 words, 76 chars = 2,981 prompt + 127 completion = 3,108 tokens.

After using it for a bit, I even got one with 4,346 prompt (115 words) + 891 (280 words) completion = 5,237 tokens

Seems a bit excessive.

Is it just me or it is extremely expensive ?

Hi Opaweld! Are you familiar with how chat history works? What usually happens with a chat log is that with each subsequent question, you are also sending back the entire history of your conversation. So with each additional question, you re-ask all of the questions and provide all of the answers, then ask the next question. This continues up to the context window of the model, at which point your application needs to start pruning the chat history, usually by removing the oldest messages and answers. This is how GPT can follow the conversation from question to question.

If you are using GPT-4, yes, that gets expensive in a hurry. There may be some ways of forcing MacGPT (I’m not at all familiar with that app), to maintain a smaller chat history, which might put a cap on the overall size of any given request. That will make the bot behave more “forgetful” because it won’t maintain your previous answers in it’s history for as long.

If you are having an extended conversation and want the context of your previous questions and answers, there’s no easy way to achieve this with a large language model (today) that doesn’t involve this repetitive and token-intensive approach of re-processing the entire conversation with each request. There are some libraries/wrappers that perform “conversation compression” which is they use an additional LLM call to summarize the previous conversation and make it take up fewer tokens in the chat history, and that is lossy and has its own issues, as you might imagine.


Hi, and thanks for your answer.

I suspected that it might have something to do with that, but I didn’t know it worked that way and that it could use so many tokens so fast. I guess it makes sense, but I didn’t know everything was systematically reprocessed.

I see more clearly now. I think I’ll just switch from the API to the Plus subscription when I run out, it should be a lot more cost effective for long conversations.

By the way, MacGPT can be seen as an assistant integrated directly into MacOS and the software you use. With a shortcut, you can open the conversation and write prompts directly. You can even write prompts in your software. In the long run, the conversion can get busy.

Thanks again.