Controlling Text Length in LLM-Generated Survey Responses: Challenges and Observations

I’m working on a project that creating mock data to a survey question sets. As part of the requirement, I wanted to control the each text based question’s length. For some reason, most of LLM generate answer way shorter than I ask. (very weirdly all llm gave me similar length/distribution).

So for example,
Q: what are you current pain points in your HR team?
can you give me a fake 20 survey answers to that? Each answer’s length should be average 50 words. distribution: xyz.

A: Here are the word counts for each of the survey answers:

  1. 25 words
  2. 19 words
  3. 19 words
  4. 23 words
  5. 21 words
  6. 22 words
  7. 21 words
  8. 20 words
  9. 21 words
  10. 25 words

And then, it fixes its answer pretty easy if I ask again and tell them that length’s different from what I’ve asked. But I need to get a good answer set at one go as this is only a small part of my prompt/code. It’s tricky to add another layer/step of llm (and also retrieving the answer and feed it to the next llm) just to fix this length issue.

Suspicious as other llm have exactly same issues and behaviour. Am I missing anything? (is this normal llm tendency from word limit or something?)

LLMs don’t understand “words,” as they deal in tokens. You’ll have better luck describing the length of output you want in terms of number of sentences or paragraphs.