Privacy firewall
for LLM apps
Intercept and anonymize PII before it reaches OpenAI, Anthropic, or any LLM — then rehydrate it in the response. Domain-aware, 55+ languages, 3 lines of code.
from privacy_firewall import create_firewall
# Domain-aware -- keeps diagnoses, strips PII
firewall = create_firewall("healthcare")
result = firewall.process(
text="Patient John Doe, SSN 123-45-6789,
diagnosed with hypertension.",
context={...},
)
# -> "Patient [PERSON_001], [REDACTED], diagnosed
# with hypertension."
# Medical terms preserved. PII stripped.Detect · Anonymize · Rehydrate
A transparent privacy layer between your app and any LLM. Zero changes to your existing prompt logic.
"Patient Ana Garcia, DNI 12345678A, diagnosed with hypertension."
Raw text containing PII arrives from user or upstream service.
PERSON -- Ana Garcia NATIONAL_ID -- 12345678A DIAGNOSIS -- hypertension (keep)
One or more backends (regex, Presidio, GLiNER, Transformers) detect entities. Domain rules decide what to keep.
"Patient [PERSON_001], [REDACTED], diagnosed with hypertension."
Entities replaced per their disposition: keep, pseudonymize, redact, generalize, mask, or hash. Profile rules decide which action applies per entity type.
LLM processes sanitized prompt. Real PII never transmitted.
Sanitized prompt forwarded to any provider: OpenAI, Anthropic, Mistral, local models. Zero changes to prompt logic.
"Patient Ana Garcia, DNI 12345678A, diagnosed with hypertension."
Vault restores original values in the model's response. End-users see real names - the LLM never did.
"Patient Ana Garcia, DNI 12345678A, diagnosed with hypertension."
Raw text containing PII arrives from user or upstream service.
PERSON -- Ana Garcia NATIONAL_ID -- 12345678A DIAGNOSIS -- hypertension (keep)
One or more backends (regex, Presidio, GLiNER, Transformers) detect entities. Domain rules decide what to keep.
"Patient [PERSON_001], [REDACTED], diagnosed with hypertension."
Entities replaced per their disposition: keep, pseudonymize, redact, generalize, mask, or hash. Profile rules decide which action applies per entity type.
LLM processes sanitized prompt. Real PII never transmitted.
Sanitized prompt forwarded to any provider: OpenAI, Anthropic, Mistral, local models. Zero changes to prompt logic.
"Patient Ana Garcia, DNI 12345678A, diagnosed with hypertension."
Vault restores original values in the model's response. End-users see real names - the LLM never did.
Built-in presets for your industry
Each domain profile decides what's sensitive and what the LLM must see to do its job. Fully customizable.
Healthcare Profile
Keep clinical context. Anonymize patient identifiers and account data.
- • Diagnoses (hipertensión, diabetes)
- • Medications (enalapril, lisinopril)
- • Procedures & observations
| Action | Entity | Example |
|---|---|---|
| PSEUDONYMIZE | PERSON | Ana García → [PERSON_001] |
| REDACT | NATIONAL ID | 12345678A → [REDACTED] |
| GENERALIZE | AGE | 43 años → 40-49 |
| GENERALIZE | DATE | 15/03/2024 → 2024 |
| REDACT | ana@clinic.es → [REDACTED] | |
| REDACT | IBAN | ES12345678 → [REDACTED] |
"Paciente Ana García, DNI 12345678A, 43 años, hipertensión. Consulta: 15/03/2024. Email: ana@clinic.es. Prescripción: enalapril 10mg."
"Paciente [PERSON_001], [REDACTED], 40-49, hipertensión. Consulta: 2024. Email: [REDACTED]. Prescripción: enalapril 10mg."
firewall = create_firewall("healthcare")Mix and match detection engines
Start with a preset, then swap in the engine that fits your data. Each card shows the exact install and firewall call.
- Structured IDs
- Emails & phones
- Credit cards
- Zero ML deps
pip install "pii-firewall"firewall = create_firewall("healthcare", detector_backend="regex")add_custom_regex(...)- Named entities (persons, orgs)
- Multi-language NER
- Best speed/accuracy balance
- Extensible
pip install "pii-firewall[presidio,langdetect]"firewall = create_firewall("healthcare", detector_backend="presidio")custom_recognizers=[...]- Zero-shot NER
- No fine-tuning needed
- Custom entity types on the fly
pip install "pii-firewall[gliner]"firewall = create_firewall("healthcare", detector_backend="gliner")define your own entity labels- Eg.: Biomedical NER (d4data, BC5CDR)
- Highest accuracy on specific domains
- GPU acceleration
- HuggingFace catalog
pip install "pii-firewall[transformers]"firewall = create_firewall("healthcare", detector_backend="transformers", transformer_model_id="d4data/biomedical-ner-all")transformer_model_id="..."- Token-level PII classifier
- Language-agnostic
- Structured output with spans
- Viterbi decoding
pip install "pii-firewall[opf]"firewall = create_firewall("healthcare", detector_backend="opf")swap in a different token classifier- Regex + Presidio combined
- Maximum entity coverage
- Catches structured IDs NER misses
pip install "pii-firewall[presidio,langdetect]"firewall = create_firewall("healthcare", detector_backend="hybrid")regex rules + custom_recognizers=[...]- Custom regex rules
- Custom Presidio recognizers
- Custom HF / Transformer models
pip install "pii-firewall[presidio,transformers]"firewall = create_firewall("generic", detector_backend="presidio", custom_recognizers=[...])add_custom_regex(...) · custom_recognizers=[...] · transformer_model_id="..."6 ways to handle PII
Each entity type in a domain profile gets a disposition — the precise transformation applied when that entity is detected.
| Action | Result | Reversible |
|---|---|---|
| KEEP | Pass through unchanged | N/A |
| PSEUDONYMIZE | Replace with reversible token | Yes |
| REDACT | Irreversible deletion/redaction | No |
| GENERALIZE | Keep category, drop precision | No |
| MASK | Partial reveal | No |
| HASH | SHA-256 one-way hash (analytics only) | No |
Drop into your stack in minutes
Pick your ecosystem, paste the wrapper, and ship. Provider, framework, and transport examples are tuned for fast copy-paste.
from openai import OpenAI
from privacy_firewall import create_firewall
client = OpenAI()
firewall = create_firewall("healthcare", detector_backend="presidio")
context = {"tenant_id": "acme", "case_id": "c1", "thread_id": "t1", "actor_id": "u1"}
def openai_llm(prompt: str) -> str:
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return resp.choices[0].message.content
# PII is stripped before reaching OpenAI, restored in the reply
result = firewall.secure_call(text=user_input, context=context, llm_client=openai_llm)
print(result.final_text) # real names restoredNo SDK lock-in — wrap any HTTP LLM endpoint, local model with Ollama, or LiteLLM proxy the same way.
55+ languages, zero configuration
Language detected automatically per message with thread-level caching — zero latency after the first call. Locale-specific patterns ensure country IDs and formats are recognized correctly.
# Auto-detect (default)
firewall = create_firewall("healthcare")
# Force Spanish
firewall = create_firewall("healthcare", language="es")Built for regulatory compliance
GDPR Art. 17 right to erasure, tenant isolation, and full audit traces — out of the box.
Right to Erasure (Art. 17)
firewall.forget() wipes all vault mappings for a thread or case. After deletion, rehydration will not restore any original values for that scope.
deleted = firewall.forget(
tenant_id="hospital-001",
case_id="patient-123",
thread_id="consultation-1",
)
# -> 14 mappings deletedTenant Isolation
Token mappings are scoped by tenant_id. The same [PERSON_001] token in different tenants never shares a mapping. Hard isolation at the data layer.
context = {
"tenant_id": "hospital-001", # hard boundary
"case_id": "patient-123",
"thread_id": "consultation-1",
"actor_id": "doctor-456",
}Audit Trail
Every call produces a TraceRecord with entity types, confidence scores, and applied replacements — ready for your compliance dashboard or SIEM.
result.trace.detected_entities
# [{type:"PERSON", text:"Ana Garcia",
# confidence: 0.97}, ...]
result.trace.replacements
# [{"original":"Ana Garcia",
# "token":"[PERSON_001]"}, ...]Up and running in minutes
Three steps from install to your first anonymized LLM call.
Install the package
pip install "pii-firewall[presidio,langdetect]"
# Download a spaCy language model
python -m spacy download en_core_web_sm # English
python -m spacy download es_core_news_sm # SpanishCreate a firewall
from privacy_firewall import create_firewall
# Pick a domain profile
firewall = create_firewall("healthcare") # or "finance", "legal", "generic"Anonymize & rehydrate
context = {
"tenant_id": "hospital-001",
"case_id": "patient-123",
"thread_id": "consultation-1",
"actor_id": "doctor-456",
}
# Anonymize before sending to LLM
anon = firewall.anonymize(text=user_input, context=context)
llm_response = my_llm(anon.sanitized_text)
# Rehydrate — restore real names in the response
final = firewall.rehydrate(text=llm_response, context=context)
print(final) # End-user sees real valuesMore examples
from fastapi import FastAPI
from privacy_firewall import PrivacyFirewallSDK
app = FastAPI()
sdk = PrivacyFirewallSDK.create(domain="healthcare", detector_backend="presidio")
@app.post("/privacy/sanitize")
async def sanitize(req: dict):
result = sdk.anonymize_text(text=req["text"], context=req["context"])
return {"sanitized_text": result.sanitized_text}firewall.add_custom_regex(
entity_type="EMPLOYEE_ID",
regex=r"\bEMP-\d{6}\b",
locales=["GLOBAL"],
confidence=0.95,
context_words=["employee", "staff"],
disposition_action="redact",
)deleted = firewall.forget(
tenant_id="hospital-001",
case_id="patient-123",
thread_id="consultation-1",
)
print(f"Deleted {deleted} mappings")
# Art. 17 GDPR right to erasure satisfied