Creating an AI detector, I think i have it

saigatatarica · November 21, 2023, 5:26am

I get ur point bro, thinking about that days ago…

Selder · November 21, 2023, 7:52am

I’ve done a lot of testing on a wide range of detectors, most simply establish a basic structure of most likely responses and try to build probability on top of that, but even the most sophisticated ones have their problems, especially when mixing human text and AI text and ask you to separate which is which, I’ve seen human texts pass as AI and AI texts pass as human texts, not to mention that determining the probability of the next word in a large set of information is computationally improbable. A silly but useful example, ask the most famous detectors to describe the possibility that it was generated with AI, but that the text in languages other than English, not to mention that, standard by default, you can make the model actively choose to bypass the detection algorithm during output generation. The friend may have been ironic, but you are being childish by assuming that he doesn’t WANT a tool like this to exist, this is a forum precisely to discuss possibilities.

I also think it’s unlikely, GPT 4 can no longer be correctly perceived, a possible GPT 5 or even an AGI will make it even more difficult. Now, nothing stops you from proving this theoretically, even if creating such a detector is unfeasible with current technology.

aubrey_ghose · November 21, 2023, 8:03am

Moving forward, OpenAi might consider creating a high value subscription model for Institutions, organisations and individuals by building TRUST and proving it via an NDA agreement for Users. This would mean that non programmers, like me, and organisations, could benefit from co-creating with an AI that they can trust with precious IP. Game changer?

Selder · November 21, 2023, 8:18am

It is the trend and also what I believe, universities and lines of research would benefit from the process, I also believe in broader models, in which a researcher’s knowledge would be added as relevant data within that context, of course this implies acceptance on the part of the same, but it is possible, and can boost academic creation to much broader levels than the current ones, since we are, in a certain way, limited to our own knowledge and field of action, which unequivocally harms cross-research between different thematic areas.

supershaneski · November 21, 2023, 8:51am

Currently, there is a way for organizations/institutions to make partnership with OpenAI to create open source and private data sets.

Check this link:
OpenAI Data Partnerships

aubrey_ghose · November 21, 2023, 9:12am

Wow, perfect, thank you. Good to see the lights are on and people are home at Open AI.

vb · November 21, 2023, 9:23am

As others have pointed out this is a highly challenging field of research and even the best sometimes have to return to the whiteboard in order to deliver high quality results when it comes to differentiating human from AI generated content.

Below is a link to the previous serious attempt from OpenAI to tackle this issue. But this approach has been retired (for now):

As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated.

The good news is that, if your idea has merit, you can follow the advice given by the “hostile” commenters and contact OpenAI with your idea. Because, why not?

mail.reknew · November 21, 2023, 9:23am

I think the least bit of copy-editing by a human with decent skill will confound any detector. Heck, you could even prompt an LLM on how to do the editing and it will probably be sufficient to evade detection. Read the piece of AI-gen output and you say to yourself “How does this writing still sound automated?” and you fix those words and phrases that seem unnatural to you. Basically what any good copy-editor does for any author, human or not.

aubrey_ghose · November 21, 2023, 9:29am

Convergence of the sciences, thinkers and subject matter experts - its the holy grail for AI. Love the way you think.

callum.bradbury · November 21, 2023, 10:13am

The more we interact with AIs, the more we will begin to speak like them, as opposed to the opposite happening. Imagine a child raised around AI, their language will likely be indistinguishable from a bot, in terms of language structure. I already find myself doing it to a certain extent.

generalbadwolf · November 21, 2023, 11:39am

I Agree with this sentiment.

AI generated text is easy for me to recognize (or so i think)
Because ive stared at it so much - the pattern kinda melted into my brain.

generalbadwolf · November 21, 2023, 11:40am

this… is an interesting thought.

But my experience suggests some things that cannot be done with AI might be the reason we dont adopt AI mannerisms.

merefield · November 21, 2023, 12:15pm

I think I have a foolproof method, but this will only work in interactive sessions:

if the response is >= 5 paragraphs, makes sense and was returned in < 10 seconds I think you can be pretty sure it was AI that responded.

Now can I please collect my reward?

Fusseldieb · November 21, 2023, 12:39pm

It’s not as easy as that.

I mean, sure, to a degree, you can be sure that it follows simple things like the highest probability next token, but OpenAI does randomize these probabilities quite a bit. Since you don’t know where the randomness comes from, you cannot (or hardly can) extrapolate the result (AI/not AI).

For instance, human text works in pretty similar ways: Our texts wouldn’t make sense if suddenly beach mango cat ouch.

In fact, OpenAI did want to make texts traceable with custom randomness which they could then use to pinpoint if a text was coming from them, or not. Source: OpenAI is developing a watermark to identify work from its GPT text AI | New Scientist

The thing is, each model has it’s own writing style. You would need to train a model (or similar) to specifically identify a single model, since another even similar one could produce wildly different writing styles.
And even if you somehow mastered it, custom instructions can steer the same model to produce another writing style, throwing you all off again.

AIs mimic human writing. If you ask it to generate text in the style of Harry Potter, it’ll do it because it was trained on it. This, in turn, would mean that if you flag the AI Harry Potter novel, another book written in the same style by the same author would be flagged as AI even if it wasn’t.

Even OpenAI itself failed to identify AI-written texts sometimes

The single thing that you could identify is non-perfect phrases or texts, which would soon fall apart when you try to analyze text written by people who ‘do it for a living,’ basically.

Even perplexity isn’t a good indicator. Granted, people have bursts of perplexity and unique “marks” that they leave, but remember, AI is trained on human text and will eventually grasp these marks and apparent bursts of perplexity if you steer it well enough to write in a unique, directed style.

Also, there’s one more thing: Even if you get the model to be 97% accurate, imagine the havoc. Eg. 9700 student papers are being graded and 300 of them are being falsely accused of AI, even though they did their hard work and did nothing wrong. They just wrote a little “too well”. See the problem? Such a tool would need to be practically 100% accurate, because teachers are cracking down hard on AI-generated work. They don’t understand that such a thing can fail.

It’s not that people “hate” on the idea. It is simply because you’re trying to detect something that is extremely hard to do.

_j · November 21, 2023, 4:10pm

And even worse, despite a particular corpus and resulting weighting, the n+1 nature of a transformer language model means all assumptions about its “style” can be thrown out the window, as it can be trained by the earlier part of the prompt context itself to evade detection.

Example:

anderskkehlet · November 22, 2023, 1:42pm

It’s my non-expert opinion that reliable detection is impossible. Not just in practice, but in principle. I don’t know the proper terminology and stuff, but it’s to do with the amount of information available in the text. (barring intentional watermarking)

generalbadwolf · November 22, 2023, 10:21pm

Well… the entire science behind large language models and how they generate text in response to your prompt is entire based on a nest of patterns.

Patterns beget patterns, and when you train an LLM with a sum of data what its doing is learning from the sum-average of those patterns.
Thus the patterns of generation will have generic qualities.

One could think of it as the [average] [way] that someone might [respond] to [a message] in whatever form it may exist

Thus being the average of that data, there is absolutely a way that you can use the model itself in stages to calculate the probability of composition in any given body of text.

A constructive probability within a margin would provide a prediction within a certain margin, to define whether or not a given body of text was generated with ai.

A body of text either in a single instance, or the sum of multiple instances while automatically accounting for deviation.
(such as mixing in organic posts)

anderskkehlet · November 22, 2023, 10:47pm

As has been pointed out, it’s easy to get an LLM to radically change its writing style. I just don’t see how you can do that analysis without knowing the prompt.

jwatte · November 22, 2023, 11:51pm

This doesn’t solve for false positives at all.
Here’s the crux:

GPT was trained on the web, on books, and other things.
Take some piece of text from the web, or from a book in the training set.
Run the algorithm on this (obviously human-sourced) text, and it will show you “very likely GPT would generate this text” (very low perplexity)

This is because, well, GPT is very likely to generate the training data!

Selder · November 22, 2023, 11:57pm

It would quickly be outdated, maybe it would work well for 3.5, but with the current prompt engineering, it’s difficult, and it doesn’t solve the main problem: it doesn’t provide you with guarantees, just an estimate.

Not to mention the parameters, black box and other models.

It’s basically a theoretical question, but it’s not possible to shed light on practice. My guess is that an ethical resolution on what can be taken seriously in prompt-based texts will be established before detection becomes viable (if at all). will be).

Topic		Replies	Views
What are your strategies for spotting AI writing? Community chatgpt , writing	45	1951	April 17, 2025
Does using ChatGPT change your vocabulary, too? Community chatgpt , in-the-news	33	2867	April 21, 2024
Are GPT writers a waste of time? GPT builders	17	1753	December 11, 2024
GPT scares me and here's why Prompting	91	13317	December 15, 2023
Darn masterpiece from GPT-4o in my math education Prompting gpt-4	22	974	December 2, 2024

Creating an AI detector, I think i have it

Related topics