I wrote this short article contextualizing the release of Fine Tuning and it’s implications
Apologies if I’m not fully understanding the question, but in case this helps: the content filter is itself a fine-tuned version of the model, and it shouldn’t be necessary to have a separate filter for fine-tuned models - just use the standard content filter. Because of the way fine-tuning works (it just tweaks some of the weights in the original trained model - they aren’t separate things) I don’t think it would be possible to trace whether toxic content was caused by fine-tuning or would have occurred in the original model as well - but hopefully this shouldn’t be too much of an issue since the content filter should work regardless!
Ah ok I think I get what you’re saying a bit more! It’s an interesting question, and I don’t have a good answer for it at the moment, but I’ll discuss it with the safety team!