Are we repeatedly charged for all tokens in the context window?

schweini · May 30, 2024, 7:23pm

Maybe a silly simple question, but: When using the completion API, we always have to send all the previous parts of the conversation.
Are we being charged only for the newest part, or are we being charged for all the tokens that are in the context window, every time we send something to the API?

E.g. if I send a short story of 1,000 tokens, and then ask varios seperate questions abuot this short story, I have to send the initial story text every single time. Am I being charged these 1,000 tokens every time I ask a question?

_j · May 30, 2024, 7:31pm

Yes. You are charged for what you send as input.

The AI is stateless, having no memory of past users, sessions.

You provide that simulation of memory when you give some of the past chat to a new API call.

schweini · May 30, 2024, 8:36pm

Thank you for your answer.

But I am looking into hacking something together where multiple users will be able to ask various different questions to a text file or maybe eventually video. I’d really rather not have to pay for the tokens in the file hundreds of times.

So what would be the correct way to go about this? Maybe finetuning using the file, and then let the users ask that fine-tuned version?

LinqLover · May 30, 2024, 10:29pm

Fine-tuning has worse recall than information provided in the prompt. Maybe consider RAG so you do not need to deliver the full text upon every model invocation or using a cheaper model.

bloodlinealphaDev · May 30, 2024, 11:11pm

For your case you will either need to use custom RAG or try using the Assistants API: https://platform.openai.com/docs/assistants/tools/file-search/quickstart … which might be cheaper/easier, but its hard to say!

Topic		Replies	Views
Context reuse for shared GPTs and Assistants without additional per-session input token cost GPT builders	3	810	February 16, 2024
Why are my context tokens used so quickly? API api	3	2835	January 5, 2024
Seeking Advice on Reducing Costs for RAG Chatbot Using File Search Assistant API api	4	1035	July 6, 2024
Do Assistants use tokens to access Files (every time)? API	2	65	April 22, 2025
Can Instructions be reused at no cost? Or, how to save on tokens API	4	2965	January 1, 2024

Are we repeatedly charged for all tokens in the context window?

Related topics