Measuring Perplexity using API

Hi! As a researcher, I would like to be able to compute perplexity measures for responses to a certain prompt — both in terms of what the system returns as a response (the perplexity of that), and in terms of a phrase/word that I want to measure the model against.

For example: how good a response would “cat” be (its perplexity score) to the prompt “what’s an animal with four legs?” And what is the perplexity score of the system’s response (say it’s “dog”)?