OpenAI has published Privacy Filter, a small model for detecting and masking PII in text. It can run locally, supports a 128k context window, and comes with tools for redaction, evaluation, and fine-tuning. Looks especially useful for teams that need fast, on-prem privacy filtering with control over precision and recall.
Highlights:
- Permissive Apache 2.0 license: ideal for experimentation, customization, and commercial deployment.
- Small size: Runs in a web browser or on a laptop – 1.5B parameters total and 50M active parameters.
- Fine-tunable: Adapt the model to specific data distributions through easy and data efficient finetuning.
- Long-context: 128,000-token context window enables processing long text with high throughput and no chunking.
- Runtime control: configure precision/recall tradeoffs and detected span lengths through preset operating points.