I am trying to fine-tune ChatGPT to produce ‘counterspeech’ to hate speech inputs. Counterspeech is any response which seeks to undermine the hateful content. However, when I attempt to fine-tune, I am told by the UI that:
The job failed due to an invalid training file. This training file was blocked because too many examples were flagged by our moderation API for containing content that violates OpenAI’s usage policies in the following categories: hate. Use the free OpenAI Moderation API to identify these examples and remove them from your training data.
Presumably, if it wants me to remove the hateful examples from my training data, I will not be able to go ahead with the project? As, of course, every example is hateful: there is a hate speech prompt and a counterspeech response that I am trying to fine-tune on.
Any help would be greatly appreciated, thank you!