Making GPT Assistant answer me just numbers without losing context

I made a simple Asisstant using the asisstants api. The assistant takes two answers as a prompt, one of the answers is from the user and the other one is the “correct” answer. I want GPT to rate from 0 to 10 the answer from the user and limit the response to just the number. The problem is that it looks like the model is ignoring the answers and giving me a random number from 0 to 10 without taking into account the rating. This doesn’t happen if I allow it to give me an explanation of the reason for the rating.

This is the instructions i gave to the assistant:
“You are an assistant that rates a user’s response. The rating will be a number from 0 to 10. Your response should only include the rating you gave, without additional text. You don’t need a document to rate the response.”

Hi

A few things.

  1. Assistants API is in beta and it is still difficult to impossible to have them follow the instructions strictly.

  2. This doesn’t happen if I allow it to give me an explanation of the reason for the rating.
    This is a known feature of LLM’s “minds” called chain of thought. If you have it first reason out loud and only then give an response, the quality of response will be higher.

  3. Show your prompts for more feedback.

Ask it to give you the result as JSON with the reason, and just ignore the reason.

Alternatively, Tell it you want both, but have it only display the number. Yes, it’s sometime that silly :slight_smile:

2 Likes

First, there is nothing in this that would make one think that an “assistant” is the correct path. You would only get unknown distraction from the scoring task at hand, which is best performed by a single call to an AI model through the completion or chat completion endpoint.

Secondly, you need an AI that can answer the question at or beyond the quality of the provided or student answer. Right now, that is gpt-4-0613, or the prior gpt-4-0314.

We then must target the output that you wish: a single score. You are fighting against an AI model that is a token predictor at heart, and will follow patterns of answering as much as the reasoned thought of answering.


It sounds like the fault is partly in not giving the AI the full picture. You haven’t given me the full picture. Pretend you were a professor giving the job you want the AI to perform to a human teacher assistant like me.

To score, I’d want to know

  • the domain of knowledge,
  • the student level and expected competency,
  • current applicable coursework,
  • and most of all: the question.

If I’m as knowledgeable as an AI then, I don’t need the “right” answer ( say of Japanese Kofun period history to know the score of a student’s essay about dougu or haniwa) – the “correct” answer is an encumbrance. A student may have a completely different but correct answer.


Then if you really want a number, you have to play some token games. Numbers themselves are an individual token, and unlike words, do not have a leading space. We can make the output very likely to only be the score.

Here, I’m going to show an interesting completion technique: token certainty with logprobs, with only a 1 token answer. Everything up to the quotes within the JSON is text I input:

We get an answer 85% “1”, and 15% “0”. The AI is wrong though: samurai were not in the Kofun period. Nor is there a “burial mount”. I have to give the AI a 15%.

An answer without error “A kofun is a burial mound where ancient dignitaries or elite were entombed.” takes correct score from 85.59% to 99.89%

AI should not be making academic decisions in the absence of human oversight. Try to explain to the student they got a poor mark because an AI said 86% correct… You can try this in a casual setting where AI judgement doesn’t really matter (like NOT a situation where AI can ban ChatGPT accounts…)


(PS gpt-4-preview-1106 scores the poor answer “0” on either a 0-1 or 0-10 scale with the necessary description. The same problem you have. It also emits completely unexpected and stupid markdown container, and no way to examine how sure the answer is.)

1 Like