Solidarity Protest For the Stack Overflow community

Conversely, if they cite properly and give credit in an effective manner, it could encourage people to continue posting. For example, if I were to post and answer and everytime it was used to answer a question in ChatGPT I was credited with a citation and mention of my username, plus I got an upvote, I might be encouraged to answer more.

But let’s talk about the elephant in the room - Stack Overflow is just one small part of what ChatGPT does.

It should be doing this for all of its content, not just SO.

It shouldn’t require having to do some biz deal just to get OAI to do the right thing.

1 Like

I don’t think citations were a part of original thought process and design of OpenAI, and that’s just not OpenAI, as far as I know, code fragments or any other piece of information do not get cited on any other AI platforms as well. I could be wrong.

Yeah, and this could be a large part of general anxiety. OpenAI was/is a bull in the china shop of content usage etiquette. It’s no wonder so many people are pissed.

Hopefully once the SO thing happens they’ll show it’s possible to do it right and people will start forcing them to do it for everything else.

So… Does OpenAI Developer Community Forum have new leaders and moderators now?

Seriously, though, I haven’t seen GitHub users protesting.

Besides, as Stack Overflow and Github user I don’t really give a flying care. Do we have a list of protesters?

Nah it’s the same old bunch of people as always, we thought this could be a good way of having a discussion about how AI companies interact with online communities.

I agree that there’s a lot of general anxiety, but I think most of it is because people don’t have a clear understanding of AI is trained and used, and I honestly think OpenAI has been fairly forward about what they’re doing.

I’m all in for citing and attributing the original creators of content you use, but I don’t expect you to constantly cite all the book’s went through while learning how to read, imagine the extreme end of how this will end:

I’m suing everyone in this topic for using words straight out of the dictionary in your posts. Yeah, you heard me! Every noun, verb, and adjective you’ve dropped has been meticulously noted and added to my lawsuit. That’s right, every time you speak or write, you owe me $1. I accept cash, checks, IOU’s, Amazon gift cards and your eternal silence!

At some point this just becomes rediculous, one could argue that you don’t need to read the entire internet, just to learn how to read, but language models are not smarter than you on that front, they need a lot of training data to accomplish very little.

I think it makes sense to cite the original creators when you’re actually retrieving knowledge from a specific source, but there needs to be a better understanding of how data is used during training.

I vigorously refute this. Getting the exact training data used by OpenAI is impossible - afaik.

Do correct me if I am wrong!

Again, I refute. If the question is about a definition of a word, absolutely GPT4 should cite the dictionary.

No? They’re obviously not publishing their entire exact dataset, just telling you what’s in it. But it’s basically “the pile” + curated data from chatGPT and OpenAI’s various partners.

I think that’s a lot more forward than some of the other AI companies who doesn’t ask or tell us about what they’re doing, like have a look at this:

You’ve deliberately misquoted my dumb joke in order to twist it towards your own argument, so let me help you quote it properly

See? It was about using language and not the definition of words, it’s truly a great discussion we’re having here! Nothing like a good ol’ flame war to remind us all why the internet is a breeding ground for well-reasoned and respectful dialogue. Can’t wait to see who wins the gold medal in mental gymnastics :rofl:

Citation please! Heh.

You’ve deliberately misquoted my dumb joke in order to twist it towards your own argument, so let me help you quote it properly

Perhaps, but I also felt your argument wasn’t very good faith either. I think it’s implied that citations should be relevant.

Seems like we’re just going in circles again, if you have anything to add that I can’t answer by just quoting myself, please do tell! :rofl:

You obviously didn’t cite the dictionary, and I’m not expecting you to, so I’m curious to hear your opinion on when citations become relevant :thinking:

And that is why OpenAI is not sustainable, because people will be forced to creating their own corpus silos to defend their IP and rewrite the law to optimize royalty recovery.

You realize that our entire reality changes with AI attribution systems that facilitate the royalty accounting.

Attribution platforms will save humanity from destruction. Every utterance, every event published can be tagged and tracked for monetization. Every contributor can be compensated via direct deposit in commodity of choice by contracting an autonomous observer subscription service to chronicle and vectorize intelligence for brokered reuse.

Right, but we’re not arguing over the definition of a word so I didn’t cite the dictionary.

I had an argument - that OAI is not forthcoming on their training data. I backed up this central thesis with a citation.

I agree GPT shouldn’t spuriously cite, but it should cite resources related to the core questions that users enter into it.

Indeed, irrelevant citations devalue the relevant ones, so it would in fact be just as destructive as not citing at all.

Click thru rates! OAI should A/B test their citation selection algorithm which achieves the highest click thru rates. I am assuming of course that they have effectively filtered their training data of spam / copycat sites.

I think it’s reasonable to put a cap on the number of cites per query. Maybe 3 or so? Depends on length of response of course, and number of core ideas.

In general though, take a look at an academic paper and think about what the bar is for citation there. AI should respect that bar.

I think it’s important to note what we are unified in our support of.

I, for instance, am a longtime user and contribute to SO and I personally do not have any problems with the OpenAI/StackOverflow deal, but I recognize that some people have different thoughts, opinions, and feelings.

I think they raise good and important questions which need to be addressed as we are forcibly hurtled into unfamiliar territory.

I, personally, do not agree with the particular manner in which they have decided to protest but, and this is the critically important point I want to make, *I don’t need to".

Protests need to do one thing above all else—bring awareness to the issue you care about.

Unfortunately, that generally means making a nuisance of yourself in some way, otherwise no one will pay attention.

I understand why they are protesting by defacing their answers… Given the hugely asymmetric power dynamic between the individuals and the corporations there’s not a lot else they could do.

I just don’t think it is going to be effective.

Even so, I support the individuals standing up for something which is clearly important to them.

What I staunchly oppose is StackOverflow permanently banning users for participating in this protest.

First, because I see it as punching down and disproportionately harsh. SO can revert all of these changes with a script. It may be a nuisance, but it’s also just some of the most important users of the community trying to get the attention of the powers that be.

Second, because I see it as a foolish and short-sighted decision. All of this will pass eventually, one way or another, and alienating some of your most important users is a bad idea.

Other leaders and moderators here may fall elsewhere in their support of the protest, but I feel we’re all pretty solidly aligned in our support of the protestors.


I fully agree with @elmstedt here, and I think this situation is a great opportunity to have a discussion about what we actually expect from both the AI models and their creators.

I think @qrdl has some interesting ideas, which I’ll get back to later after having a good night’s sleep, but in the meantime, we should all remind ourselves that all of these AI models are created to be consumed by us. If we’re not happy with the output, then OpenAI won’t be either.

I for one am not a supporter of vandalism or defacement, but they certainly have a right to be and plenty of reasons to be upset.

If you want to get SO folks perspective, this is good - Our Partnership with OpenAI - Meta Stack Exchange

The top voted post is not surprising to me. I just wish people wouldn’t narrowly reference SO.

I’m just a mug user, not a lawyer. I have some issues trying to separate the outrage from what I believe is reality. I have not made a single contribution to SO but it is a resource that I use frequently. I learn, I pilfer. From what I understand, the stereotypical developer spends a lot of time googling. So, money aside, I don’t understand the outrage at the fact that information knowingly posted to public forums (used to train people for years) may now be used to train models.

I didn’t get it when artists complained about their art (that they posted) could have been used for training. I have a hard time separating training me from training a model. If I went to art school, I would get “inspiration” from other artists and maybe existing artworks. If I learn engineering, I use engineering text books etc. I can also borrow art books or engineering text books from a library and train myself at home or find resources online to learn coding or cross-stitch at zero cost because users have provided that information for others to use. It’s not behind a paywall or protected by Fort Knox like security because it was intended to be publicly available.

All of civilization is built brick by brick on top of work done by others, if I do work for free, IMHO, I should not be trying to “reclaim” that later. It’s in the public domain (and I’m using that literally).

The genie is not going back in the bottle. And there certainly should not be a re-negotiation of terms because the goal-posts moved years later. Wow, where would that leave us all. All the startups that tried to capitalize on early versions of the openAI API only to have their exact functionality superseded by later versions would be licking their lips I suppose.