Creating an AI detector, I think i have it

It’s not as easy as that.

I mean, sure, to a degree, you can be sure that it follows simple things like the highest probability next token, but OpenAI does randomize these probabilities quite a bit. Since you don’t know where the randomness comes from, you cannot (or hardly can) extrapolate the result (AI/not AI).

For instance, human text works in pretty similar ways: Our texts wouldn’t make sense if suddenly beach mango cat ouch.

In fact, OpenAI did want to make texts traceable with custom randomness which they could then use to pinpoint if a text was coming from them, or not. Source: OpenAI is developing a watermark to identify work from its GPT text AI | New Scientist

The thing is, each model has it’s own writing style. You would need to train a model (or similar) to specifically identify a single model, since another even similar one could produce wildly different writing styles.
And even if you somehow mastered it, custom instructions can steer the same model to produce another writing style, throwing you all off again.

AIs mimic human writing. If you ask it to generate text in the style of Harry Potter, it’ll do it because it was trained on it. This, in turn, would mean that if you flag the AI Harry Potter novel, another book written in the same style by the same author would be flagged as AI even if it wasn’t.

Even OpenAI itself failed to identify AI-written texts sometimes

The single thing that you could identify is non-perfect phrases or texts, which would soon fall apart when you try to analyze text written by people who ‘do it for a living,’ basically.

Even perplexity isn’t a good indicator. Granted, people have bursts of perplexity and unique “marks” that they leave, but remember, AI is trained on human text and will eventually grasp these marks and apparent bursts of perplexity if you steer it well enough to write in a unique, directed style.

Also, there’s one more thing: Even if you get the model to be 97% accurate, imagine the havoc. Eg. 9700 student papers are being graded and 300 of them are being falsely accused of AI, even though they did their hard work and did nothing wrong. They just wrote a little “too well”. See the problem? Such a tool would need to be practically 100% accurate, because teachers are cracking down hard on AI-generated work. They don’t understand that such a thing can fail.

It’s not that people “hate” on the idea. It is simply because you’re trying to detect something that is extremely hard to do.

3 Likes