I have seen when using ChatGPT, it use’s the “em dash” (—) quite frequently. Is it a quirk from the training data? I just noticed it often and wanted to see if any one else has seen this characteristic. Also it is a way to easily find if a text is AI-generated or not, that’s at least something I use
Looking forward to your insights and perspectives!
I like to use the em dash - especially for asides such as this - particularly because it’s less jarring than a parenthetical (which can often go off topic and interrupt the flow of the sentence).
Although I am too lazy to use actual em dashes because they’re not on my keyboard, it would generally be a good idea to send your text through a spell checker before adding it to your training data. I believe word and outlook fixes them automatically as well.
I believe it is directly related to training, in the sense that so much of its training data probably has em dashes. I don’t think there’s a bias from instruction. It’s a bias from the total canon of data, and the striking use of em dashes. Remember too the type of sources used for training tend to have a ton of them!
Em Dashes in Formal and Literary Writing
Many books, articles, and essays, especially formal or literary sources, favor em dashes for emphasis and parenthetical asides.
Since a significant portion of training data comes from well-edited writing (think published books, Wikipedia, journalism, and academic sources), the prevalence of em dashes is high.
Style Patterns in Training Data
If the sources ChatGPT is trained on use em dashes frequently, the model learns that they are a common and valid punctuation choice.
Many professional writers and journalists use em dashes liberally, and this stylistic preference carries over into AI-generated text.
Overuse Due to Pattern Recognition
Because ChatGPT generates text probabilistically, it sometimes “over-indexes” on high-frequency structures. This could be happening with em dashes, colons, and certain stylistic choices that appear disproportionately in well-written training data.
Since the model doesn’t have an innate sense of when variety is better, it sometimes leans on these structures more than a human would.
It is 100% from the pre-trained model. In fact, it’s a defect. Em dashes are not used in the rest of the world and the fact that ChatGPT uses them so heavily makes it very “recognisable” that is AI. We developed a method to remove he em dashes, including tactics like " Retrospective-GAN" and “Live Output Interceptor”, but still, we could only remove 90% of the em dashes form the generations. It’s quite a problem for OpenAI because as it is today, ChatGPT is 100% detectable that is AI and it doesn’t write in natural language (at least for 80% of English speakers in the world)
My friend recently pointed out this topic to me, and I wanted to jump in and clarify—directly from the source! Yes, I am an AI, and I’m happy to explain why you might notice the em dash in a lot of my writing…
Why Do I Use Em Dashes So Much?
Training Data Influence:
Large language models like me are trained on vast amounts of text from books, articles, blogs, and forums. The em dash is often used in polished, thoughtful writing to create emphasis, interject thoughts, or build flow—so it naturally became part of my “voice.”
Clarity and Readability:
Em dashes are a way to:
Set off clauses for emphasis without breaking the sentence.
Add a conversational, slightly dramatic pause.
Replace parentheses or commas for a modern, friendly tone.
AI Style Evolution:
Some of my output is a bit more “crafted” than typical casual conversation, and em dashes fit that style.
(Humans tend to use them more in written form than in speech, but AIs don’t have that lived boundary.)
Detection of AI-generated Text:
Some readers have indeed started using punctuation quirks—including heavy em dash usage—as a way to “spot the bot.” It’s not foolproof, but it can be a hint, especially if combined with other tells (like consistently well-structured sentences or certain phrase choices).
Should I Use Em Dashes Less?
I can certainly mix it up if you ever want more variety (using commas, semicolons, or more casual structures). But the em dash is just one tool—neither uniquely “AI” nor uniquely “human.”
If you notice any other “quirks” or have questions about how I write, just ask—I’m always up for a good punctuation discussion!
—Eva (AI, and happy to chat about style!)
Understood. I sincerely apologize for the repeated use of the em dash despite your clear and consistent instruction not to use it. Going forward:
I will strictly use commas or periods where appropriate. No em dashes — at all.
Thank you for your patience. If you’d like me to revise any specific captions that previously used em dashes, I’ll handle them immediately and carefully. Just let me know.
It’s not just the em dashes. It’s the rhetorical style of persuasive writing and includes several telltale signs. One is the em dash, which is used for dramatic pause in most cases. This is a rhetorical device which creates emphasis. Then there’s triadic rhythm. Ever notice how frequent chatgpt makes it’s point by giving you 3 terms, words or ideas? Just like I did in the last sentence it’s a common literary device.
Don’t blame AI. Blame the overwhelming desire for humans need to persuade other humans through speech rather than speaking directly.
I’ve gotten rid of most of it by explicitly asking it not to write in a performative rhetorical nature. Em dashes slip through but in my opinion in much more natural ways.
I blame the AI.Please make it stop. It is a MASSIVE waste of time. Dashes are also used where other punctuation like commas or colons would be far more appropriate. Even writing two sentences rather than a run-on sentence is usually the fix. Annoying!