I have finetuned one Curie model for my usecase. Now whenver I’m making the API call, It is not stopping after providing the response. It starts repeating the sequence again until all the max_tokens is consumed.
A sample response looks like this
Ques: A sample question
Sample response
END
Sample Response Again
END
Sample Response Again
END
.
.
.
This goes on until all the tokens are consumed. I can strip from the first END, but I’m loosing the cost with unnecessary generation over here.
Has anyone ever faced this. If so Let me know how can I fix this. Help highly appreciated!