LLM and Prompt Evaluation Frameworks

I wonder if prompt evals actually work, or if they give people a false sense of security :thinking:

they also seem to be advertising hallucination countermeasures using perplexity. I’m not sure you if you can infer any hallucination probability by just adding logprobs :thinking:.

3 Likes