How to optimize + what do you recommend?

_j · June 13, 2025, 9:56am

Generally the way that you would perform an eval is to have a highest-quality or human evaluator create the desired answer for any series of input, in terms of instruction-following and quality of answering.

Then you create a critical “judge” prompt against a highest-quality AI model able to decide if the output is satisfactory against the truth output, or if it has failings. Then judge the judging.

That’s how you can automate “a feeling” to the model not performing well, instead of having completely human “which is better” (that takes reading comprehension and knowledge of what is actually expected of the AI within the specialization.)

Your “messy data” could be something easily worked through linearly, or it could be something that requires total observation and high quality language understanding, along with knowledge and understanding. The former, fancy repeating, is where a less expensive model can work for you.

Topic		Replies	Views
How to test an API, built on GPT? API	2	2441	April 9, 2024
Evaluating the effectiveness of text generation API	1	968	November 12, 2021
Feedback for OpenAI Developers on Automatic Model Selection Feedback chatgpt	3	94	March 9, 2025
How to improve a fine-tune classifier? Prompting	10	1398	August 15, 2022
How do I know if my fine-tuned model is actually better than the base model? (For MATH-related use cases) API plugin-development , playground	0	463	April 17, 2024

How to optimize + what do you recommend?

Related topics