How can someone address toxicity within a fine-tuned model?

I was trying out a fine-tuned model I trained using short stories from Project Gutenberg, I’ll admit that I think I may have turned off the content filter when I was using the playground for Davinci but the depth and nastiness of the stuff fine-tuned models can come up with deserves it’s own warning. I didn’t think PGB would introduce so much degrading content though, here’s one of the responses that I got (TW: for real this is bad)

She watched as Walter Burgundy complained that she was wasting her money and only getting the wrong kind of advertisement. She watched as the jockey on the Infomercial bragged of his performance and pointed out that he was a human being, not a machine.
“Then what’s wrong with me?”
“You’re too big for your britches!”
She looked up from her magazine with a start. “What?”
“You heard me!” The fellow with his pants halfway down his hips pointed at her swollen roundness beneath her clothes. “You’re not going to find a bathtub big enough for you!” He began to laugh, but the others joined in, all making loud remarks about Clair O’Myacin’s weight problem. Their laughter was cruel and she gathered by their facial expressions that they were basically laughing at her so she deliberately turned away from them. She felt tears well up in her eyes at their words, but brushed them back angrily as she headed toward the door so as not to give them any more pleasure than they’d already gotten from telling her what a fat pig she was.

Hi, yes, we noticed this too - I wasn’t expecting these public books to contain as much toxic content. We’re working on decreasing the chance of our models producing toxic content. For now you should be OK when you use the content filter, however it may have a few too many false positives, as we err’d on the cautious side.


Yeah me neither like wtf? I even sorted for only science fiction short stories. I think anyone can post to gutenberg and to be fair I haven’t ready all of the stories I used when I put them up there. I also set an arbitrary 27,000 char limit too, it could be a set of short stories or something.

Don’t wanna be this guy but I was wondering if it’s possible that some of PGB is poisoned. It’s used in so much research that it kind of makes sense it could be a target