Advanced spelling/grammar check with fine-tuned gpt models - potential moderation issue?

I’m trying to integrate a spelling and grammar checking tool into a desktop application. I’m using TipTap as my text editor, and it can output a JSON formatted document object as well as employ custom marks for formatting so that I can, for example, enclose a corrected piece of text in the ‘corrected’ mark and enclose the correction in a ‘correction’ mark and it works like you’d expect a tracked-changes function to work in a text editor.

In test cases with the chat version of GPT-xx (so far they all seem to handle this task fine), it works great.

However.

The application is for writers, and they might be writing explicit content or violent content (like a fight scene in a novel, or a sex scene in an erotic short story or longer romance novel, I mean, there are a billion possible scenarios) and I’m concerned that even if the API endpoint is explicitly instructed not to generate original text that violates it’s moderation guidelines, of course the language in the paragraph object might trip the moderation anyway regardless of overall context.

Obviously, there’s good reasons why GPT, davinci, etc., won’t generate explicit or harmful text, I get it, there’s a huge potential for abuse. But do these guidelines become any more flexible when fine-tuning a model for a specific usecase like this? The capability of the GPT models, even 3.5/turbo is miles above the capability of some other tools like grammarly or prowritingaid to intelligently identify these things, and with context that can be included like any tagged proper names in the document object, etc.

GPT-4o and I have been experimenting with options like identifying ‘harmful’ words and replacing them with placeholders with instructions to skip over those, which I can parse back into the text afterward, but of course I can’t account for the thousands of possible words, variations, combinations, mispellings, etc., to run the replacements in the first place.

So what’s the score with fine-tuning? Is it possible to fine-tune a model to get around at least some of that? I honestly don’t love the idea of users, I don’t know, writing the propaganda of some fourth reich or whatever on my application, but also I’d like to not hamstring authors who are just cussing or writing a sex scene.

Hi!
The terms of service do not change when using a fine-tuned model. When it is detected that the generated outputs in fact are in breach of the terms of service then your account can and will be suspended.

You can use the moderation endpoint to protect your developer account but you cannot offer the creative writing service based on a OpenAI model in case the input gets flagged.