Token efficiency in context injection

fabian.leal-villasec · March 24, 2024, 10:57pm

Dear community,

Im attempting to give the gpt-4 chat completion model some information before prompting by feeding it a variable called context that consists of the same 200 concatenated short sentences before each prompt, something like this:

        for prompt in prompts:
            response = self.client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": instructions},
                    {"role": "user", "content": f"{context}{prompt}"},
                ],
                temperature=0
            )

But as you can imagine, the costs of this are pretty high.

Just wondering if you’d be so kind to suggest options to make this more cost efficient? Thanks!

jr.2509 · March 25, 2024, 2:25am

Hi and welcome to the Community!

Given that the API is stateless and every API is treated independently from one another, you don’t have a choice but to provide the context everysingle time.

However, depending on the specifics of what you are looking to achieve, there may be alternative options/approaches you can consider that may be more cost friendly. If could share more details or an example of your typical system and user messages, then we can see if that applies.

mark.p.r.robinson · July 29, 2024, 8:49am

I came here to ask the same thing, I have a decent chunk of context and then a tiny bit that varies each time and will send hundreds of these through.
I noticed that google Gemini offers context caching so I am going to look at that.
The other thing I was thinking of is to batch process, i.e. give the context and then give an array of the variable bit and ask for an array to be returned although I haven’t tested this yet, I am slightly concerned that it will mix up elements of the array.

Topic		Replies	Views
Reducing costs from the previous context and system instructions when using chat completions api API api	3	280	October 5, 2024
Context reuse for shared GPTs and Assistants without additional per-session input token cost GPT builders	3	831	February 16, 2024
A question about the context. May I ask everyone API	3	1783	December 19, 2023
Is possible OpenAI API caching the conversation? API	4	3864	June 4, 2024
Persisting context without consuming from the API API gpt-4 , gpt-35-turbo	4	783	November 6, 2023

Token efficiency in context injection

Related topics