Using fixed prompt templates with over 4000+ tokens did not trigger the half-price caching discount

hongyhbs · October 26, 2024, 3:24am

I used two fixed templates, A and B, with A being around 700 tokens and B around 3300 tokens, totaling 1000 interactions. I sent 100 interaction requests in each batch. In the end, the tokens did not trigger the half-price caching discount. Could somone clarify whether the caching discount requires multi-turn interactions to be triggered? My interactions were single-response, with about 1000 independent requests per interaction.
knowledge_system：3300 tokens
mapping_rules：700 tokens
‘problem_data’: about 1000 tokens


def format_prompt_for_model(self, prompt: Dict[str, Any]) -> str:
    try:
        problem_uuid = prompt['problem_data']['problem_uuid']

        formatted_prompt = (
            "Please complete the knowledge mapping task according to the following requirements. Your response must:\n"
            "1. Strictly output in JSON format\n"
            "2. Include all required fields\n"
            "3. Do not add any extra text\n"
            "4. Include the problem ID in the returned JSON\n\n"  # Clearly specify requirements
            f"Problem ID: {problem_uuid}  // Use this ID in the output JSON\n\n"  # Explicitly instruct to use this ID
            f"Knowledge System:\n{prompt['knowledge_system']}\n\n"
            f"Mapping Rules:\n{prompt['mapping_rules']}\n\n"
            f"Problem Data:\n{json.dumps(prompt['problem_data'], ensure_ascii=False, indent=2)}"
        )

        return formatted_prompt
    except Exception as e:
        logger.error(f"Error formatting prompt: {str(e)}")
        raise

Blockquote

jr.2509 · October 26, 2024, 4:36am

Hi there!

Is the problem ID unique for every request?

hongyhbs · October 26, 2024, 9:09am

yes, it’s 1000 problem uuid

jr.2509 · October 26, 2024, 2:22pm

Ok, so if these values are different in every request, then that currently interferes with the caching. Your static content must be of at least 1024 tokens. So you should place all inputs that remain entirely unchanged first in order. The parts that are variable, such as the Problem ID, should then be placed towards the end.

hongyhbs · October 27, 2024, 12:25am

Thank you very much.
I will try it.

hongyhbs · October 27, 2024, 10:18am

put the problem uuid below the fix prompt , it’s works, thank you.

Topic		Replies	Views
Regarding the Issue of Half-Priced Prompt Caching API prompt-caching	5	326	October 25, 2024
How Prompt caching works? API assistants-api , prompt-caching	17	4171	February 4, 2025
Prompt caching with multiple agents API	1	373	October 9, 2024
Cache not caching more than 1024 tokens (expected: increments of 128 tokens) Bugs prompt-caching	6	157	November 14, 2024
Is this a problem with cached tokens? API gpt-4 , prompt-caching	3	773	October 10, 2024

Using fixed prompt templates with over 4000+ tokens did not trigger the half-price caching discount

Related topics