Best Practice To Process and Summarize Many Responses to a Survey


Currently have a lot of responses to surveys we take from people.

A typical response to a single question would look like this…

        "name": "Name of responder",
        "question": "Survey Question",
        "response": "Responders Answer",
        "sentiment": "Sentiment of the responder's answer",
        "role": "responder's role ex) junior, senior, CTO, security"

I’d like to ask GPT to extract themes that are common across specific roles.
ex) “What are people in junior roles struggling with?”

If we have a lot of responses, how could we handle it with the token limits of GPT? And how could we make sure we don’t lose information as GPT generalizes over a lot of text?

With the title and tags used it is not clear to me if you are looking for way to use only ChatGPT with specific prompts to do this or are looking to use an API? Big difference on how to answer.

Without knowing the answer to that I would have to say you will not succeed with just ChatGPT and prompts but will have to use the API and some datastore.

See these online lessons for information that will help you decide.

Great point, I’ve updated the tags to only include API.

Probably the first thing to do is to classify those responses that actually might have “struggles”, using an embedding model. See how responses compare against a few examples of the responses you wish to pluck out, and isolate them from responses without challenges by characterizing the difference in vector distance between good and bad. These can be bulked through text-embedding-ada-002 cheaply.

You might, just by doing that, have the answer you need by limiting the cases to review.