With the title and tags used it is not clear to me if you are looking for way to use only ChatGPT https://chat.openai.com/ with specific prompts to do this or are looking to use an API? Big difference on how to answer.
Without knowing the answer to that I would have to say you will not succeed with just ChatGPT and prompts but will have to use the API and some datastore.
See these online lessons for information that will help you decide.
Probably the first thing to do is to classify those responses that actually might have “struggles”, using an embedding model. See how responses compare against a few examples of the responses you wish to pluck out, and isolate them from responses without challenges by characterizing the difference in vector distance between good and bad. These can be bulked through text-embedding-ada-002 cheaply.
You might, just by doing that, have the answer you need by limiting the cases to review.