Can I save tokens by generating instructions with tools?

mano-wii · November 24, 2023, 9:05pm

I’ve been trying to understand how billing works in a assitant thread.

Despite the various topics* about it, I see a lot of what seem like assumptions, and I’m still not sure how things work.

Is it true that in each run, all tokens from previous messages are re-added to billing?
And what about the supposed internal tokens returned by the tools (it seems that retrieval injects much more than just the answer to a subject). Do they remain in the message history and are added again and again?

Since Instructions are generally the ones with the most tokens, I wonder if I can just put in basic instructions and have the assistant request more instructions depending on the context.

Even if the tool’s response is counted in billing, if that response is not stored in the message list, it is not counted in future messages, so this (in theory) could save a lot of money.

* Topics:

Foxalabs · November 24, 2023, 10:20pm

All of the messages in the message list will be used on every call, this is how the context system works, if the messages are so long that there is no room, the system will truncate the earlier messages to fit.

mano-wii · November 24, 2023, 10:29pm

GPT may be hallucinating, or the billing there is different.
But bellow is a response I had with an assistant. It actually keeps everything that is returned from the tools. But it was said that old messages are not counted.

GPT
Yes, I keep a record of the information provided and discussed during our conversation. This allows me to quickly reference previously mentioned details, such as the user’s Windows version in Issue 115334, without needing to re-access the original source. This ability to retain information during a specific session is crucial to maintaining continuity and relevance in responses.

User
But is this counted in the token count and billing?

GPT
No, the retention of information during our conversation is not counted in token counting or billing. Token counting and billing are related to processing new text, such as your questions and my answers to them. The ability to remember and reference details from previous parts of our conversation is a built-in function that does not incur additional costs. Therefore, you can reference past information without worries about token counting or additional billing.

Foxalabs · November 24, 2023, 10:31pm

Please do not use the AI for software guidance on newly released features, it has no information regarding these and will hallucinate incorrect answers.

mano-wii · November 24, 2023, 10:36pm

Ok, I understand that tokens from previous messages are counted again.

But is it the same price as the tokens for a new message? They shouldn’t be because the tokens have already been computed in previous messages, I don’t see the need to compute them again. It should be cheaper at least.

Foxalabs · November 24, 2023, 11:01pm

This is how the current generation of LLM’s work, everything needs to be sent again, even if computed last time.

This is an emergent technology and it may be that in 12 months time we laugh at how primitive it all was back then, for now this is the way it works.

matt0sai · November 25, 2023, 7:04pm

Andrej Karpathy has a video where he talks about hallucinations in LLMs, this quote is from his notes:

“The dreams and hallucinations do not get fixed with finetuning. Finetuning just “directs” the dreams into “helpful assistant dreams”. Always be careful with what LLMs tell you, especially if they are telling you something from memory alone. That said, similar to a human, if the LLM used browsing or retrieval and the answer made its way into the “working memory” of its context window, you can trust the LLM a bit more to process that information into the final answer. But TLDR right now, do not trust what LLMs say or do. For example, in the tools section, I’d always recommend double-checking the math/code the LLM did.”

Topic		Replies	Views
Can Instructions be reused at no cost? Or, how to save on tokens API	4	2884	January 1, 2024
Assistant using too many tokens API gpt-35-turbo , assistants-api , assistants-pricing	3	1012	April 28, 2024
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2769	May 24, 2024
Assistant API - What are Context Tokens in the Billing calculation? API assistants	24	12420	May 6, 2024
Why are my context tokens used so quickly? API api	3	2812	January 5, 2024

Can I save tokens by generating instructions with tools?

Related topics