LLM and Prompt Evaluation Frameworks

Diet · September 20, 2024, 1:48am

Are you building, as you say, “a conversation between two people” here?

If you have your ToT in the same thread, you’ll eventually start cross-contaminating your contexts. If your ToT consists of independent (i.e. spread instead of loop) ideations, then that’s what I would be suggesting.

And whether the ideation is a conversation or not doesn’t really matter all that much to the model, I think. I base this on the continued effectiveness of using low-frequency patterns to steer the models: How to stop models returning "preachy" conclusions - #38 by Diet (the system-user-assistant conversation being the lowest frequency pattern in this sense).

“take input” in my mind is just a function, a resource, the system can tap. in your case, I guess, a human. this would be realized as “ask sponsor” or “ask operator” (which could just as well be an AI system on its own, or another instance of itself). Instead of just injecting the response as a “user response” - I’d typically insert it as an ancillary context document that is probably required to continue the task.

So I don’t really see LLMs as chatterboxes, I see them as document evolvers.

I’m not saying that you guys are wrong, and I agree that these models are getting tuned and trained for this. I just think this is a mistake if you really want to put models to work.

Topic		Replies	Views
Tools for Testing Custom GPT Prompts Prompting prompt-engineering	55	14593	March 12, 2025
Prompt Regression Testing - API Usage Prompting api , prompt-engineering	10	247	February 14, 2025
How do you measure prompt performance? API	4	4313	August 3, 2024
Managing prompts in production Prompting api , prompt , prompt-engineering	11	4152	January 22, 2025
Is an LLM which both generates and critiques its output a contradictory practice? Prompting gpt-4	3	143	November 23, 2024

LLM and Prompt Evaluation Frameworks

Related topics