Cost reduction strategies

Orome · July 24, 2024, 7:35pm

Naively (maybe too naively) it would seem that a way to reduce costs using Assistants that are a lot off context among a group of users would be to keep (nearly) all context in attached vector stores, and submit greatly reduced context with each request. I gather from some posts here that folks are pursuing strategies along these lines, but it’s not clear to me how to achieve that. Using max_prompt_tokens would seem to be the obvious way to limit the size of the request, but that also seems to limit how much of the attached vector store is used.

Why am I missing here?

Topic		Replies	Views
How to reduce price for assistant Prompting assistants-api	1	373	May 25, 2024
Why are my context tokens used so quickly? API api	3	2820	January 5, 2024
Optimizing Costs and Context Billing in the OpenAI Assistant API API api	0	448	March 25, 2024
Context reuse for shared GPTs and Assistants without additional per-session input token cost GPT builders	3	784	February 16, 2024
Token consumption: Prompt tokens exponentially increase when using Threads (Assistants) API assistants-api	8	528	September 5, 2024

Cost reduction strategies

Related topics