AI-detection tools suggestion method

I hope this message finds you well. I am writing to share an idea that I believe could contribute to the ongoing efforts in AI text generation detection and watermarking.

The concept involves embedding a subtle, visible watermark within AI-generated text by replacing normal spaces with dash-like characters (such as hyphens, en dashes, or em dashes). These characters act as markers that can carry encoded information about the text’s AI origin.

For example :

Normal sentences:

So basically, I think anime is just the animated version of the manga.

AI-generated text with dash-based watermark (when copied):

So-basically,-I-think-anime-is-just-the-animated-version-of-the-manga.

When copied and pasted, the dashes replace spaces, making the watermark detectable while the sentence remains readable.

Human-Readable but Distinct: The text remains fully readable, though the altered spacing creates a subtle visual cue that the text is AI-generated or “marked.”

Strength: Using visible characters instead of invisible spaces means the watermark is more likely to survive copying and pasting across different platforms and text editors.

Ease of detection: Detection tools could scan for the presence and pattern of these dash characters to verify the AI origin reliably.

Distinction between human and AI generated: This approach will provides a clear and distinct watermark that could assist both humans and automated systems in identifying AI-generated text.

Considerations:

The text might feel slightly unusual or off due to the replaced spaces, which could alert readers or users to its AI-generated nature.

This can be altered by removing each dash, but it is inconvenient to the user.

Adoption would require integration into AI text generation pipelines and consideration of user transparency and privacy.

I believe this approach could complement existing invisible watermarking methods by providing an additional layer of traceability that balances readability and detectability.

I would be happy to discuss this idea further or provide examples and prototypes to demonstrate its feasibility. Thank you for your time and consideration.

AI response detected. Too-familiar of patterns. Might as well have been sent from OpenAI’s support for how botty it is.

2 Likes

Yeah, you’re right, I did use AI to help lay it out cleanly. But the actual idea? That’s mine. I just used the AI as a tool to explain it better and make sure I covered the details clearly.

Prototypes please. As was indicated could be offered.

or…I got u fam:

def watermark_text(text: str) -> str:
 import itertools
 d=['\u002D','\u2013','\u2014']
 c=itertools.cycle(d)
 return"".join(next(c)if char==' 'else char for char in text)

Then you just have to pass it into an existing AI, “hey, my spacebar broke so I typed different characters instead. Fix up my text for me!” Done faster than clicking all the bicycles in an image.

AI generated output starts here

This watermarking concept is indeed shallow and easily defeated. The core idea of replacing spaces with dash-like characters can be reversed with a simple regular expression. The primary challenge for such a watermark is its inability to distinguish itself from legitimate uses of hyphens or dashes, especially if standard hyphens are used for watermarking.

The following Python code snippet demonstrates this. The regex [-\u002D\u2013\u2014]+ targets sequences of one or more hyphens (U+002D), en dashes (U+2013), or em dashes (U+2014). Replacing these sequences with a single space effectively reverses the described watermarking.

While this method efficiently removes the watermark, its indiscriminate nature means it will also convert legitimate hyphenated terms (e.g., “word-joiner” becomes “word joiner”). This side effect underscores the watermarking scheme’s shallowness: if it uses common characters like the hyphen, it cannot be robustly distinguished from normal text features, making its removal trivial at the cost of potentially altering intended hyphenation. If the watermarking exclusively used less common dashes (e.g., only en/em dashes) and standard text only used basic hyphens, a more targeted removal would be possible, but the prompt’s example itself uses standard hyphens for watermarking.

import re

def remove_dash_watermark(text: str) -> str:
    """
    Removes dash-based text watermarks by replacing various dash-like
    characters (hyphens, en dashes, em dashes) with spaces.

    The regex targets one or more consecutive dash characters to handle
    potential variations in watermark implementation (e.g., '---' for a space).
    """
    # Regex to find hyphens (U+002D), en dashes (U+2013), or em dashes (U+2014)
    # The '+' ensures that sequences of dashes are replaced by a single space.
    # \u002D is the standard hyphen.
    # \u2013 is the en dash.
    # \u2014 is the em dash.
    dash_pattern = r"[\u002D\u2013\u2014]+"
    
    # Replace found dash patterns with a single space
    dewatermarked_text = re.sub(dash_pattern, " ", text)
    
    return dewatermarked_text

# Example usage based on the prompt's watermarked text
watermarked_text_example = "So-basically,-I-think-anime-is-just-the-animated-version-of-the-manga."
original_text_example = "So basically, I think anime is just the animated version of the manga."

# Demonstrate removal
dew_example = remove_dash_watermark(watermarked_text_example)

print(f"Watermarked: \"{watermarked_text_example}\"")
print(f"De-watermarked: \"{dew_example}\"")
print(f"Matches original: {dew_example == original_text_example}")

# Example demonstrating effect on a legitimate hyphenated term
# If "state-of-the-art" was part of a text where spaces were replaced by hyphens:
# Original sentence: "This is a state-of-the-art system."
# Watermarked: "This-is-a-state-of-the-art-system."
# (Note: the hyphens in "state-of-the-art" are original, others are watermarks)
mixed_text = "This-is-a-state-of-the-art-system."
dew_mixed_text = remove_dash_watermark(mixed_text)
print(f"\nWatermarked (with legitimate hyphens): \"{mixed_text}\"")
print(f"De-watermarked: \"{dew_mixed_text}\"")
# Expected output: "This is a state of the art system."
# This shows the watermark is removed, but "state-of-the-art" is altered.
# This alteration is a consequence of the watermark's shallowness, as the
# removing tool cannot distinguish the 'watermark hyphens' from 'original hyphens'
# if they are the same character.

# Example with a word-joiner as mentioned in the prompt
# If the text "a word-joiner example" is watermarked: "a-word-joiner-example"
word_joiner_watermarked = "a-word-joiner-example"
dew_word_joiner = remove_dash_watermark(word_joiner_watermarked)
print(f"\nWatermarked (word-joiner): \"{word_joiner_watermarked}\"")
print(f"De-watermarked: \"{dew_word_joiner}\"")
# Expected output: "a word joiner example"
# This again shows that "word-joiner" is affected, which is a direct result
# of the watermark's design being too simple to differentiate.

Addendum: Translation of the idea’s text into Japanese.

The Japanese language primarily uses particles and a mix of character sets (Kanji, Hiragana, Katakana) to delimit words and grammatical structures, rather than relying on spaces as extensively as English. Therefore, a watermarking scheme based on replacing spaces with dashes is fundamentally less applicable or effective for Japanese text. The following is a Japanese translation of the provided concept:

この概念は、AIが生成したテキスト内の通常のスペースをダッシュのような文字(ハイフン、エンダッシュ、エムダッシュなど)に置き換えることにより、目立たない可視のウォーターマークを埋め込むというものです。これらの文字はマーカーとして機能し、テキストのAI由来に関するエンコードされた情報を運ぶことができます。コピー&ペーストすると、ダッシュがスペースの代わりに表示され、ウォーターマークが検出可能になる一方で、文は可読性を保ちます。

All the AI had to say, completing my task


How about: never give people the text. Render it as convoluted SVG with randomized curve points. Oops, AI vision FTW.

1 Like