"OpenAI Privacy Filter: How Open-Weight PII Detection Works and Why It Matters for Enterprise AI"

OpenAI quietly shipped its first open-weight PII detection model under Apache 2.0 licensing. That is not a small thing. A company built on proprietary APIs releasing a model you can download, fine-tune, and run anywhere without paying per-token fees is a signal. It is also a benchmark: the model reaches 96% F1 on standard benchmarks, runs in a browser via WebGPU, and ships with a CLI tool called opf that masks PII in files before they ever touch an LLM.

This is a practical guide to how it works, what it can and cannot do, and what it means for enterprise AI compliance pipelines.

What It Is: Bidirectional Token Classification

The Privacy Filter is a named entity recognition (NER) model built for PII detection. It classifies tokens as belonging to one of eight PII categories or no category at all.

The model does not rely on regex patterns. It uses bidirectional token classification: a Transformer encoder that reads text left-to-right and right-to-left, capturing context on both sides of each token. That context is what lets it distinguish a genuine email address from a string that looks like one but is not (e.g., "[email protected]" in a fictional context versus a real one in a support ticket).

It outputs BIOES tags: Begin, Inside, Outside, End, Single. A single token entity like a phone number gets a Single tag. Multi-token entities like street addresses get Begin + End tags. This scheme lets the decoder recover precise token spans for masking or redaction.

Three-Stage Training

OpenAI describes a three-stage training pipeline:

Pretraining on large-scale text with token-level annotations.
Supervised fine-tuning on annotated PII datasets with BIOES labels.
Constrained decoding via a Viterbi decoder that enforces valid tag sequences. The decoder prevents impossible transitions (e.g., an End tag followed by a Begin tag without an Outside tag between them), which reduces false positives on malformed or ambiguous spans.

The constrained Viterbi decoder is a meaningful detail. Unconstrained models can produce biologically implausible tag sequences that produce phantom entities. Constraining the decoder during inference tightens precision without additional training.

Architecture

The model is a sparse mixture-of-experts (MoE) Transformer.

Parameter	Value
Total parameters	1.5B
Active parameters per token	50M
MoE experts	128
Context window	128K tokens
License	Apache 2.0

Only 50M parameters activate for each token pass. The routing mechanism selects 2 experts per token from the 128-expert pool. This is the sparse MoE trick: compute cost scales with active parameters, not total parameters. At 50M active params, you can run inference on a consumer GPU or, via Transformers.js, in a browser using WebGPU.

The 128K context window is large enough to handle long documents, chat histories, and email threads in a single pass.

Eight PII Categories

The model detects eight categories:

Category	Examples
`private_person`	Names, usernames, driver license numbers
`address`	Street addresses, city/state/zip
`email`	Email addresses
`phone`	Phone numbers
`url`	Web URLs
`date`	Birth dates, dates of death, dates of hire
`account_number`	Bank account numbers, credit card numbers
`secret`	API keys, passwords, tokens, private keys

Each category maps to a specific mask token during constrained decoding. The model can be fine-tuned to add custom categories or remap mask tokens to match downstream pipeline formats.

Performance

OpenAI reports 96% F1 on internal benchmarks. Independent evaluation against the WIDER dataset and CoNLL03 NER benchmarks places it above most open-source alternatives.

Method	Typical F1 Range
Regex patterns	60–80%
spaCy v3 NER	78–85%
Microsoft Presidio	82–88%
OpenAI Privacy Filter	94–96%

Regex is brittle. It fires on strings that match patterns but lack context. "[email protected]" in a fictional context passes a regex email detector. The Privacy Filter's bidirectional context catches these false positives. The tradeoff is compute: regex runs in microseconds; a 50M-active-parameter model runs in milliseconds. For bulk processing, batch inference on GPU is the practical path.

Code Examples

Python (Transformers)

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "openai/privacy-filter"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

text = "Contact [email protected] or call 555-123-4567."
inputs = tokenizer(text, return_tensors="pt", truncation=True)
with torch.no_grad():
    outputs = model(**inputs)

predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]

for token, label in zip(tokens, labels):
    if label != "O":
        print(f"{token}: {label}")

JavaScript (Transformers.js / WebGPU)

import { pipeline, env } from '@huggingface/transformers';

env.allowLocalModels = false;
env.useBrowserCache = true;

const classifier = await pipeline(
  'token-classification',
  'openai/privacy-filter',
  { device: 'webgpu' }
);

const text = "Contact [email protected] or call 555-123-4567.";
const results = await classifier(text);

const piiEntities = results.filter(r => r.entity !== 'O');
piiEntities.forEach(entity => {
  console.log(`${entity.word} -> ${entity.entity}`);
});

CLI Tool (`opf`)

# Install
npm install -g @openai/opf

# Mask PII in a file
opf mask --input ./data/support_tickets.txt --output ./data/masked_tickets.txt

# Dry run (see what would be masked)
opf mask --input ./data/support_tickets.txt --dry-run

# Stream mask (stdin to stdout)
cat ./data/support_tickets.txt | opf mask --stream

The CLI is the most immediate use case for teams that want to sanitize data before feeding it to any LLM API call.

Deployment Options

Local GPU inference. Run the 50M-active-parameter model on a single A100 or RTX 4090. Use vLLM or Text Generation Inference for batch serving. Throughput is sufficient for mid-volume pipelines.

On-premise server. Deploy via Ray Serve or Triton Inference Server. The sparse MoE architecture keeps memory footprint manageable. A quantized version (INT8) fits in 8GB VRAM.

Browser (WebGPU). Transformers.js supports WebGPU acceleration in Chrome and Edge. This enables client-side PII detection with no data leaving the device. Useful for privacy-first browser extensions.

Serverless. The model is too large for most Lambda-tier functions at 50M active params. It fits in SageMaker Serverless or Modal with cold-start caveats. Expect 2–5 second cold starts.

Known Limits

Context-dependent false negatives. The model is strong on explicit PII but can miss context-dependent disclosures. "My mother's maiden name is smith" is not a named entity in the traditional sense but is sensitive under GDPR's special category rules. Rules-based post-processing or fine-tuning is needed for these cases.

Multilingual performance. Benchmarks focus on English. Non-English text, especially with character-set variations (Cyrillic, Chinese, Arabic), shows lower recall. Fine-tuning on target-language data is recommended before production deployment in non-English pipelines.

No built-in audit trail. The model detects and can mask, but it does not log detection events by default. Enterprise compliance teams need to layer their own audit logging on top.

Custom entity categories require fine-tuning. The eight supported categories cover common PII but not domain-specific entities (e.g., medical record numbers, insurance policy IDs). Adding these requires fine-tuning, which the Apache 2.0 license permits.

Latency at scale. At 96% F1, precision is high, but every token still flows through the model. For real-time streaming use cases, consider a lightweight regex pre-filter to reduce the volume of tokens routed to the model.

Compliance Relevance

GDPR Article 4 defines personal data broadly. Names, addresses, email addresses, phone numbers, and identifiers like account numbers are all personal data under GDPR. The same categories, plus medical information, are protected under HIPAA in the United States.

The challenge is not identifying PII. It is doing so at pipeline scale without strangling throughput. Regex-based pipelines are fast but produce false positives that require human review and false negatives that create compliance gaps. The Privacy Filter changes the precision-recall tradeoff meaningfully.

For GDPR compliance, the model can feed into a data minimization pipeline: identify PII, apply masking or pseudonymization, and route masked data to downstream LLM processing. This supports data minimization principles in Article 5(1)(c).

For HIPAA, a Business Associate Agreement (BAA) covering the model hosting infrastructure is still required. The model itself does not solve the BAA problem.

vs Microsoft Presidio

Microsoft Presidio is the most direct competitor. It combines rule-based pattern matching with a BERT-based NER model and ships as a Python library or Docker container.

Dimension	Microsoft Presidio	OpenAI Privacy Filter
License	Proprietary	Apache 2.0
Architecture	BERT-base (~110M params)	Sparse MoE (1.5B total / 50M active)
F1 performance	82–88%	94–96%
Deployment	Docker, on-prem	On-prem, browser, serverless
Browser support	No	Yes (Transformers.js + WebGPU)
Custom entity fine-tuning	Yes	Yes
CLI tool	No	Yes (`opf`)
Monthly HuggingFace downloads	~40K	~99K

Presidio is mature and well-documented. The Privacy Filter is newer but shows meaningfully better performance and ships with browser support, which Presidio does not. The Apache 2.0 license removes the per-application licensing friction that enterprises often cite with Microsoft tooling.

Recommendations

Evaluate the CLI first. Run opf mask --dry-run on a sample of your data. It takes five minutes to install and run. This gives you a ground-truth baseline for false positive and false negative rates on your actual data distributions.
Layer audit logging before production. The model does not log detections by default. Build a detection event log (entity type, span, timestamp, source document ID) before going to production. This is non-negotiable for GDPR/HIPAA compliance evidence.
Run a domain-specific fine-tuning pass. If you have medical records, insurance documents, or legal contracts, fine-tune on 500–2000 annotated examples from your domain. The base model is strong on general English; domain-specific spans (Latin medical terms that look like names, insurance policy numbers with unusual formats) benefit from adaptation.
Use the MoE sparsity to your advantage. The 128-expert sparse architecture means you can serve many concurrent requests with moderate GPU memory. Do not over-provision. Benchmark your actual concurrency needs and size instances accordingly.
Combine with regex pre-filter for high-volume streaming. For real-time applications, route text through a lightweight regex pre-filter to quickly discard obviously clean text, then send uncertain spans to the model. This reduces compute cost without meaningfully affecting accuracy.
Evaluate Transformers.js for client-side use cases. If you are building a browser extension or a client-side tool where data never leaves the device, the WebGPU deployment is compelling. Test on your target browser population; WebGPU adoption is still uneven across browser versions.

FAQ

Is the Apache 2.0 license truly commercial-friendly?

Yes. Apache 2.0 is a permissive license with no use restrictions, no attribution requirements beyond preserving copyright notices, and no patent retaliation clauses. It is the same license behind many foundational open-source projects (Apache Kafka, TensorFlow). You can embed this model in commercial products without paying royalties or negotiating licenses.

How does it handle non-English PII?

Performance degrades on non-English text. The model was trained primarily on English data. For production use in German, French, Spanish, or non-Latin scripts, plan a fine-tuning pass on target-language annotated data. Community fine-tunes for major languages may appear on HuggingFace; check the model card for updates.

Can I add custom PII categories?

Yes, via fine-tuning. The Apache 2.0 license permits this. You will need annotated training data for your custom categories and a fine-tuning pipeline (LoRA or full fine-tune depending on your scale). The model architecture supports adding classification heads for new entity types.

What is the false positive rate?

OpenAI reports precision around 95–97% on internal benchmarks. In practice, expect 3–8% false positives on clean editorial text and higher rates on text with unusual formatting, code snippets, or fictional content. The constrained Viterbi decoder reduces phantom entities but does not eliminate them.

How does it perform on hand-written text?

Poorly. Hand-written text (scanned documents, handwriting recognition output) has irregular spacing, spelling variations, and non-standard capitalization that degrades tokenization quality. For OCR output, run a preprocessing normalization step before feeding text to the model.

Does it support batch inference?

Yes. For Python, use tokenizer with padding=True, truncation=True and pass batched inputs to the model. For high-throughput batch processing, use vLLM or Text Generation Inference with appropriate batching configuration.

Is there a hosted API?

Not at launch. The Privacy Filter is an open-weight model. OpenAI has not announced a hosted API version. If you need API access without self-hosting, you will need to operate your own inference endpoint or wait for a future API release.

For a broader view of enterprise AI security infrastructure, see the full landscape analysis: Enterprise AI Security Landscape 2026: Guardrails to Trusted Access.

Model card and source: Introducing OpenAI Privacy Filter

Menu

Share

"OpenAI Privacy Filter: How Open-Weight PII Detection Works and Why It Matters for Enterprise AI"

What It Is: Bidirectional Token Classification

Three-Stage Training

Architecture

Eight PII Categories

Performance

Code Examples

Python (Transformers)

JavaScript (Transformers.js / WebGPU)

CLI Tool (`opf`)

Deployment Options

Known Limits

Compliance Relevance

vs Microsoft Presidio

Recommendations

FAQ

Comment

"超越 Claude：Anthropic 2026 完整产品矩阵解析"

"Beyond Claude: Anthropic's Full Product Stack in 2026 — The Complete Map"

Harness Engineering 完全指南：从工业革命到 AI Agent 的约束系统设计

Klarna 的 AI 赌局：省下 6000 万美元后悄悄回调的完整时间线

"DeepMind 2026 模型生态全景：Gemini、Veo、Lyria、Genie 与 Robotics 的技术架构解析"

"AI 的绝望是安静的：Anthropic 情绪向量论文解读"

Klarna's AI Gamble: From $60M in Savings to a Quiet Reversal — The Complete Timeline

MCP vs CLI：为什么命令行正在赢得 AI Agent 的接口之争

"Agent Cloud 架构解析：Cloudflare 和 OpenAI 为什么押注分布式 AI 推理"

"AI 会替代你的工作吗？一个四维度自评框架（不是又一份安全职业清单）"

Share

"OpenAI Privacy Filter: How Open-Weight PII Detection Works and Why It Matters for Enterprise AI"

What It Is: Bidirectional Token Classification

Three-Stage Training

Architecture

Eight PII Categories

Performance

Code Examples

Python (Transformers)

JavaScript (Transformers.js / WebGPU)

CLI Tool (opf)

Deployment Options

Known Limits

Enterprise Context: GDPR, HIPAA, and the Competitive Landscape

Compliance Relevance

vs Microsoft Presidio

Recommendations

FAQ

Related Reading

Comment

CLI Tool (`opf`)