ChatGPT's Em Dash Habit: A Training Artifact or Design Choice?

amgleo · February 12, 2025, 8:54pm

I believe it is directly related to training, in the sense that so much of its training data probably has em dashes. I don’t think there’s a bias from instruction. It’s a bias from the total canon of data, and the striking use of em dashes. Remember too the type of sources used for training tend to have a ton of them!

Em Dashes in Formal and Literary Writing

Many books, articles, and essays, especially formal or literary sources, favor em dashes for emphasis and parenthetical asides.
Since a significant portion of training data comes from well-edited writing (think published books, Wikipedia, journalism, and academic sources), the prevalence of em dashes is high.

Style Patterns in Training Data

If the sources ChatGPT is trained on use em dashes frequently, the model learns that they are a common and valid punctuation choice.
Many professional writers and journalists use em dashes liberally, and this stylistic preference carries over into AI-generated text.

Overuse Due to Pattern Recognition

Because ChatGPT generates text probabilistically, it sometimes “over-indexes” on high-frequency structures. This could be happening with em dashes, colons, and certain stylistic choices that appear disproportionately in well-written training data.
Since the model doesn’t have an innate sense of when variety is better, it sometimes leans on these structures more than a human would.

Topic		Replies	Views
Repetitiveness, misspellings, and grammar errors in DAVINCI TEXT 002 API	7	830	December 17, 2023
Most annoying habit, can I make it stop? Prompting gpt-4	16	3212	June 11, 2025
How to clip "bubble wrap" from the end of responses? Prompting	18	1414	March 22, 2023
Semi colons - stop using them Prompting gpt-4	11	2130	October 25, 2023
Punctuation spacing issue API	8	2919	April 6, 2024

ChatGPT's Em Dash Habit: A Training Artifact or Design Choice?

Related topics