Fine-Tuning GPT-3.5 for Content Moderation with Complex Rule Sets

I have an extensive set of moderation rules that are simply too cumbersome to fit into a prompt, even when simplified. They would exceed 30,000 tokens, akin to a localized legal code, and consume a significant amount of the token limit. To circumvent this issue, I’m contemplating injecting these moderation rules directly into the GPT-3.5 model via fine-tuning to reduce prompt length. Is such an approach feasible, and if so, how should I go about injecting this knowledge? Since the use case revolves around content moderation and requires comprehensive coverage, using a vector database is not a viable option. Are there alternative solutions that might be more effective than knowledge injection?

Welcome to the forum.

Even with fine-tuning, there’s a chance it will hallucinate and not give you perfect answers. A better solution may be to look into embedding. We’ve got a lot of great posts on it here at the forum.

Could you elaborate on what you mean by “embedding”? Is it akin to a vector database? Our rules operate on logical assessments, such as identifying an argument within a conversation. Contextual understanding, which requires evaluating the entire conversation, is crucial for distinguishing whether the exchange indicates a genuine argument or just playful banter. This kind of nuanced, non-factual criteria performs poorly with vector databases, rendering them ill-suited for our specific needs.

Embeddings (plural) describes an AI that returns a vector that contains semantics and multi-dimensional values built up from a large language model’s AI understanding from reading the input text. Generally the same type of programming that allows it to also continue completion from that point.

The returned vector can’t be understood, but a database of many embeddings of many types of content or text can be used for similarity matching, on either elements within the set, or against a new input. Thus “vector database”.

There are definitely applications for such to extend capability of what you describe:

  • augment rules the AI knows with those most applicable
  • augment classifier behavior teaching with examples most applicable

Because AI is smart, you can embed the entire conversation you will also send to an AI language model that answers.

Because you are smart, you can come up with systems that produce the necessary similarity and augmentation.

1 Like

Thank you for your thorough response. While you do sound a bit like an AI, I am well-versed in the concept of embeddings and have even conducted relevant development work in this area. However, this approach does not solve my issue. As I’ve tried to convey, there’s no semantic similarity between the rules and the content. Perhaps my previous explanation wasn’t clear enough, so let me use another example: the relationship between laws and legal cases. You can search for relevant cases using key information, and you can also find corresponding laws through legal text. However, you can’t directly associate a legal case with a law. This is precisely the problem I’m looking to solve.

If there is no similarity — you make that similarity.

Embedding chunk 630:

Citizens United v. Federal Election Commission (2010)**:

  • Background: Citizens United v. FEC is a significant case related to campaign finance and political spending in the United States. The organization Citizens United produced a documentary critical of Hillary Clinton during her 2008 presidential campaign, and the Federal Election Commission (FEC) attempted to regulate its distribution.

  • Role of Previous Case Law: The Supreme Court’s decision in Citizens United built upon previous campaign finance cases, including Buckley v. Valeo (1976) and McConnell v. FEC (2003). The Court extended the principle of corporate personhood established in earlier cases, ruling that corporations and unions have the right to spend money on political campaigns as a form of protected free speech.

Key statutes and laws on record were the following (AI additions):

Bipartisan Campaign Reform Act (BCRA) of 2002:

  • Also known as the McCain-Feingold Act, this law amended the Federal Election Campaign Act (FECA) and aimed to regulate the financing of political campaigns. One of its provisions, Section 203, prohibited corporations and unions from using their general treasury funds for “electioneering communications” within 30 days of a primary or 60 days of a general election.

Federal Election Campaign Act (FECA):

  • This is the primary federal law regulating political campaign spending and fundraising. BCRA amended FECA, and it established rules governing the financing of federal elections, including limitations on contributions and disclosures of campaign finance information.

Example questions (just augment embeddings vectors, don’t confuse AI with returns)

  1. What were the key statutes and laws at the center of the Citizens United v. FEC case, and how did they impact campaign finance regulation in the United States?
  2. Can you explain the provisions of the Bipartisan Campaign Reform Act (BCRA) of 2002 and how they led to the legal dispute in Citizens United?
  3. How did the Federal Election Campaign Act (FECA) play a role in the development of campaign finance law leading up to Citizens United, and what changes did the BCRA bring to FECA?
  4. What was the legal framework established by the Supreme Court’s decision in Buckley v. Valeo (1976), and how did Citizens United modify or build upon the precedent set in that case regarding campaign finance regulation?

You obviously don’t have a case with an easy solution, why lawyers cost money, but hopefully I’ve sparked some ideas for AI automated synthesis of your rules, data, examples, to improve embeddings that you might go “eureka”.

Thanks for your thoughtful response. I’m in the content moderation field, which in many ways is more complex than legal work—it’s harder to categorize. I’ve tried the simpler approaches and found they’re not sufficient for my needs. That’s why I’m here, inquiring whether knowledge can be embedded into the model via fine-tuning. I’m also curious to know if there might be options for continued pre-training in the future.

OpenAI made their own moderations engine by fine-tuning. And there are different levels of moderations being done, for image generation, for pre-screening playground inputs, for denying complete API output with a finish reason, for unsharing undesired conversations, banning API accounts for jailbreaks, etc.

So there is no reason you cannot also use the same techniques of fine tuning to make your own moderation model for “baiting/trolling”, “leading off-topic”, “unreliable advice category”, “disallowed jurisdiction”, “need more context”, etc.

Injection on top of your trained model can fill in the gaps of possible circumvention where training didn’t give the actual rules being employed for a particular type of input.

I still need help. Does anyone have better practical processes and experiences to share?

You could fine tune 3.5-turbo if you have lots of example text and the resulting moderation decision for that text. These would have to be human curated to start with, but you could then start to use the model to make some determinations and then correct any incorrect ones and use that additional data to train a more accurate version, rinse and repeat until you have a percentile accuracy you are happy with.