Its interesting you say this because when I feed GPT the same audio file, it almost always scores it the same. We have 15 metrics, and each of those 15 metrics has 3-4 submetrics. We have the GPT bot return the calls in JSON, so one prompt might be…
{
key: "Substance Abuse",
description: `"First, analyze the problem and formulate your own solution. Compare your solution with the guidelines provided below. If your solution aligns with these guidelines, proceed with that answer. If it doesn't, adjust your solution to ensure it adheres to the guidelines. Always use the following guidelines as the final reference for correctness. Substance Abuse: Score in JSON based on the average scores of frequencyOfUse, amountOfUse, and impactOnDailyLife. If there is no mention of substance abuse, score in JSON 0
frequencyOfUse: Score in JSON based on mentions or implications of how often the client uses the substance.
amountOfUse: Rate depending on descriptions or indications of the quantity of substance used at a time. If they use alcohol or drugs more than 5 times a week, they should receive a score in JSON of at least 8 for substance Abuse
impactOnDailyLife: Evaluate how the substance use affects the client's routine, relationships, job, or other daily activities. If the patient mentions that their drug habits or alcohol use impacts over 4 aspects of their lives, the patient should receive at least an 8 score in JSON for Substance Abuse. Respond in JSON Format without deviation: `,
json: {
score: "",
description: "",
factors: {
frequencyOfUse: "",
amountOfUse: "",
impactOnDailyLife: "",
},
},
},
Where each of the fifteen core metrics gets its own GPT call and is fed a transcript.
This way, each metric gets its own GPT brain - and we do get better scoring results, but still it will sometimes bug out. For instance, the main metric, in the above case, “Substance Abuse”, we always want to be the average of the sub-metrics. Sometimes it might score the three sub-metrics 6 - 7 - 8 but then say the overall score is 3.
It’s not an easy task but the idea is to get it to a place where it can score things based on a set of rules that are defined by a psychologist.