I am in the midst of developing an advanced content moderation tool aimed at streamlining the process for moderators who are tasked with manually reviewing user-submitted posts on a classified website. My ultimate goal is to alleviate their workload and make their day-to-day operations more efficient. The backbone of this tool is the GPT-4 model.
To provide some context, the input texts range from a few hundred tokens up to a couple of thousand, while the prompt itself is approximately 1500 tokens. At the moment, the project is in the proof-of-concept stage.
There’s a robust infrastructure underpinning the system. When users submit text content, it’s automatically funneled, along with the prompt, to GPT-4 for processing. However, I am encountering several hurdles that I would greatly appreciate some guidance on:
- Complex Prompt Instructions: The prompt encompasses several general rules, such as eliminating any form of discrimination and rectifying spelling and grammatical mistakes. Additionally, there are about 20 specific rules, along with exceptions to the general rules, detailing how GPT-4 should behave under certain circumstances. For instance, in the case of a job advertisement, GPT-4 should modify the text so that providing a photo is optional rather than obligatory.
- Metadata Generation: Metadata is generated and structured in JSON format, which administrators use to populate the metadata for each post.
- Response Formatting: The entire response is formatted with HTML tags (except metadata) that are embedded in the JSON array.
One of the initial challenges I faced was the lack of an effective method to highlight the differences between the original text and GPT-4’s output, which forced moderators to compare both texts manually. I managed to address this by displaying the discrepancies in two distinct colors. Nonetheless, I have run into an issue where GPT-4 is cutting off a substantial amount of text that doesn’t necessarily violate the rules (discrimination, grammar/spelling errors, or specific rules) I have set. On the positive side, the responses are consistently well-structured and in valid JSON format, and the generated metadata is accurate, just the HTML response is cut off (mostly not randomly, it finishes the whole structure, but it cuts a lot of relevant data).
At this juncture, I feel as though I’ve hit a wall and would be immensely grateful for any suggestions or insights on how I can enhance and refine this feature to ensure that the tool is both effective and reliable.
First, “truncation” means cut off prematurely, whereas you probably meant to use a different description of the actions seen.
There are two mechanisms you can use:
Moderations endpoint. This gives scores on various topics. While free, it is supposed to be used on GPT-generated content, so using it solely on your own text is against the terms.
Embeddings: You can create a vector database of your own that has passing content, along with that which violates different areas. You can then score unseen content to see where in the spectrum between squeaky-clean and rule-breaking it rates by comparing topical similarity.
GPT-4 pricing is certainly not required to do what you are doing, unless you need very complex instruction-following to do text processing - apparently partially-censoring the passage?
Thank you very much for the answer, I really appreciate it.
The moderation endpoint won’t help in this case, since input texts don’t have any kind of discrimination that is listed here. For example, I am removing discrimination based on age (if in the job ad requirement that only candidates under a certain age can apply), or discrimination based on gender (only males can apply). We are removing that content more or less successfully.
Embeddings - I would definitely think more about this solution.
One thing I have done in the past by prompt is just to have contents like jokes analyzed, or to have particular passages corrected for grammar, without reproducing the whole thing, or just giving a score for “cultural insensitivity”, “sexual situations” in a table for the section.
This might be a case for just lots of prompting - telling the AI exactly what its function and goal is, and then having it produce a table with the offending phrase, and score based on age/race/gender in separate columns. You can then match the phrases back up with the original text without needing a full “rewrite” as generated output, as your action upon detection is likely to reject instead of to pass through an AI rewrite. Give each ranking 0-4 and assign three categories of them RGB colors, etc.
It is also more useful to give an “examples” category in the system prompt instructions of what it is looking for, as multi-shot examples are often confused with the input in question.
It is more predisposed to markdown (that you can parse in your code) than other output you need to describe or train on and hope that the language of contents doesn’t distract it.
Embeddings will likely focus more on the tone than actual logical decisions, like say “looking for young enthusiastic applicants” or “recent HBCU college graduates” that could need review might not distinguish itself very clearly in vector math results from other calls to action. Embeddings will have far more separation between a job listing and a real estate listing, for example.