I’m working on a project that creating mock data to a survey question sets. As part of the requirement, I wanted to control the each text based question’s length. For some reason, most of LLM generate answer way shorter than I ask. (very weirdly all llm gave me similar length/distribution).
So for example,
Q: what are you current pain points in your HR team?
can you give me a fake 20 survey answers to that? Each answer’s length should be average 50 words. distribution: xyz.
A: Here are the word counts for each of the survey answers:
- 25 words
- 19 words
- 19 words
- 23 words
- 21 words
- 22 words
- 21 words
- 20 words
- 21 words
- 25 words
…
And then, it fixes its answer pretty easy if I ask again and tell them that length’s different from what I’ve asked. But I need to get a good answer set at one go as this is only a small part of my prompt/code. It’s tricky to add another layer/step of llm (and also retrieving the answer and feed it to the next llm) just to fix this length issue.
Suspicious as other llm have exactly same issues and behaviour. Am I missing anything? (is this normal llm tendency from word limit or something?)