I used two fixed templates, A and B, with A being around 700 tokens and B around 3300 tokens, totaling 1000 interactions. I sent 100 interaction requests in each batch. In the end, the tokens did not trigger the half-price caching discount. Could somone clarify whether the caching discount requires multi-turn interactions to be triggered? My interactions were single-response, with about 1000 independent requests per interaction.
knowledge_system:3300 tokens
mapping_rules:700 tokens
‘problem_data’: about 1000 tokens
def format_prompt_for_model(self, prompt: Dict[str, Any]) -> str:
try:
problem_uuid = prompt['problem_data']['problem_uuid']
formatted_prompt = (
"Please complete the knowledge mapping task according to the following requirements. Your response must:\n"
"1. Strictly output in JSON format\n"
"2. Include all required fields\n"
"3. Do not add any extra text\n"
"4. Include the problem ID in the returned JSON\n\n" # Clearly specify requirements
f"Problem ID: {problem_uuid} // Use this ID in the output JSON\n\n" # Explicitly instruct to use this ID
f"Knowledge System:\n{prompt['knowledge_system']}\n\n"
f"Mapping Rules:\n{prompt['mapping_rules']}\n\n"
f"Problem Data:\n{json.dumps(prompt['problem_data'], ensure_ascii=False, indent=2)}"
)
return formatted_prompt
except Exception as e:
logger.error(f"Error formatting prompt: {str(e)}")
raise
Blockquote