Take this example:
class AnswerModel(BaseModel):
capital: str
And the LLM returns an answer like this:
AnswerModel(capital='Paris')
The tokens for the word “capital” count towards completion_tokens.
But isn’t that logically determined by the grammar? There is no decision for the LLM to make at that token.