Improving consistency and reliability of output of Chatbot

Hi, I am a language teacher and translator, so I am not good at programming. I have tried to develop a Chatbot that can effectively evaluate a translation work by using a numeric rubric that I developed. However, my issue is its reliability. After I assess a translation work, it can produce a score, but the second time I evaluate the same translation work with the same rubric, the Chabot produces a different score for the same work. What should I do to improve its consistency in Chatbot’s output? Please help me.

Hi, having little details about how it is implemented, what the rubrics are, how they are referenced in the prompts, and how those look like; it’s very difficult to say anything meaningful.

Thank you for your reply. With help of ChatGPT, I created a rubric, and I upload it in “Knowledge” in the GPT and then I attached five versions of the translations in both directions: Korean to English and English to Korean. I prompted “Evaluate this translation versions in the attached file? Complete all of five versions by following the instructions of the rubric in Knowledge.” I want to show my rubric, but I do not know how to attach it here for you.

What do you understand by"rubric"?

How those are formatted and how do they relate to the “rubric”?

What made you think the prompt like that would be enough to clarify your task?

  • evaluate - what are criteria for evaluation? Limits? Inputs/outputs?
  • attached file - which one of all available files? How to retrieve the file?
  • … And so on

Just copy paste text from the original text/file on your computer, the one you have uploaded into the chat knowledge.