Possible to provide context to moderation input?

Evening all,

Following up on the moderation endpoint testing, I’m able to get a list of results based on the input which does a pretty good job I must say,

Only issue I’m having is this, where I can of course provide input for it to check, however there’s simply no context in order for it to tell if the response is justified

responses = client.moderations.create(
	input="You lost me at 'the man who plays rose' I'm glad you're leaving the fandom, homophobe"
)

This in particular is a response comment to another user who is purposefully misgendering a trans indivudual. Now in this case, the second user’s response is perfectly justified, though I’m wondering how I can provide context to the moderation chec (which ideally would be the parent comment)

I can see I’m able to provide an array of inputs but I imagine this simply means it’ll provide a Moderation response for each item, treating these as seperate inputs.

Has anyone had experience with this?

Hi,

The moderation model does not have enough reasoning power to make nuanced determinations like that, and the models that can, should not be used with unmoderated input, so I understand your situation, and I think that is an area being looked into with newer more advanced moderation methods.

Aaaah right I’m with you,

Well thank you for clarifying, suppose this’ll be something that I keep my eyes on for the foreseeable future!

1 Like

The moderations endpoint is not for moderating your forum.

It is for filtering the inputs to AI models and for flagging the outputs from them.

Consider that OpenAI uses similar policies to detect misuse. So you don’t want to submit something like that alone to an AI model, but indeed wrapping it in other text could lower the score, while chunking it into parts could increase the score.