Need human like response to test the model performance

I’m using pre- trained gpt model for my use case. So I need ground truth to evaluate the results and form metrics