If you really want to make the AI “deterministic” to a fault, you could just hash and record the complete API input and parameters sent that affect generation and cache the response. No need to ever send identical inputs again.
Generate with max_response_tokens: 10 to see if the AI even wants to depart from the start of the previous if you are 90%+ getting the same beginning against the same fingerprints.
Run embeddings between the responses to see if the meaning drops below a threshold of being a different answer.