Skip to content
Open Source · Apache 2.0

Privacy firewall
for LLM apps

Intercept and anonymize PII before it reaches OpenAI, Anthropic, or any LLM — then rehydrate it in the response. Domain-aware, 55+ languages, 3 lines of code.

0+
Languages
0
Dispositions
0
Detection backends
0
Domain profiles
pii_demo.py
from privacy_firewall import create_firewall

# Domain-aware -- keeps diagnoses, strips PII
firewall = create_firewall("healthcare")

result = firewall.process(
    text="Patient John Doe, SSN 123-45-6789,
          diagnosed with hypertension.",
    context={...},
)

# -> "Patient [PERSON_001], [REDACTED], diagnosed
#    with hypertension."
# Medical terms preserved. PII stripped.
How it works

Detect · Anonymize · Rehydrate

A transparent privacy layer between your app and any LLM. Zero changes to your existing prompt logic.

01Input
"Patient Ana Garcia, DNI 12345678A,
diagnosed with hypertension."

Raw text containing PII arrives from user or upstream service.

02Detect
PERSON -- Ana Garcia
NATIONAL_ID -- 12345678A
DIAGNOSIS -- hypertension (keep)

One or more backends (regex, Presidio, GLiNER, Transformers) detect entities. Domain rules decide what to keep.

03Anonymize
"Patient [PERSON_001], [REDACTED],
diagnosed with hypertension."

Entities replaced per their disposition: keep, pseudonymize, redact, generalize, mask, or hash. Profile rules decide which action applies per entity type.

04-> LLM
LLM processes sanitized prompt.
Real PII never transmitted.

Sanitized prompt forwarded to any provider: OpenAI, Anthropic, Mistral, local models. Zero changes to prompt logic.

05Rehydrate
"Patient Ana Garcia, DNI 12345678A,
diagnosed with hypertension."

Vault restores original values in the model's response. End-users see real names - the LLM never did.

Domain Profiles

Built-in presets for your industry

Each domain profile decides what's sensitive and what the LLM must see to do its job. Fully customizable.

Healthcare Profile

Keep clinical context. Anonymize patient identifiers and account data.

✓ Keeps (pass-through)
  • Diagnoses (hipertensión, diabetes)
  • Medications (enalapril, lisinopril)
  • Procedures & observations
Transforms
ActionEntityExample
PSEUDONYMIZEPERSONAna García → [PERSON_001]
REDACTNATIONAL ID12345678A → [REDACTED]
GENERALIZEAGE43 años → 40-49
GENERALIZEDATE15/03/2024 → 2024
REDACTEMAILana@clinic.es → [REDACTED]
REDACTIBANES12345678 → [REDACTED]
Live example
Input
"Paciente Ana García, DNI 12345678A, 43 años,
hipertensión. Consulta: 15/03/2024.
Email: ana@clinic.es. Prescripción: enalapril 10mg."
↓ PII Firewall
Output (sanitized)
"Paciente [PERSON_001], [REDACTED], 40-49,
hipertensión. Consulta: 2024.
Email: [REDACTED]. Prescripción: enalapril 10mg."
firewall = create_firewall("healthcare")
Detection Backends

Mix and match detection engines

Start with a preset, then swap in the engine that fits your data. Each card shows the exact install and firewall call.

base
Regex
< 1 ms
  • Structured IDs
  • Emails & phones
  • Credit cards
  • Zero ML deps
Best for: Zero-dependency environments or fast structured-data pipelines.
Create firewall
Regex
pip install "pii-firewall"firewall = create_firewall("healthcare", detector_backend="regex")
Customize: add_custom_regex(...)
recommended
Presidio
50–200 ms
  • Named entities (persons, orgs)
  • Multi-language NER
  • Best speed/accuracy balance
  • Extensible
Best for: General-purpose production workloads with NER requirements.
Create firewall
Presidio
pip install "pii-firewall[presidio,langdetect]"firewall = create_firewall("healthcare", detector_backend="presidio")
Customize: custom_recognizers=[...]
zero-shot
GLiNER
100–400 ms
  • Zero-shot NER
  • No fine-tuning needed
  • Custom entity types on the fly
Best for: Custom entity types without labeled training data.
Create firewall
GLiNER
pip install "pii-firewall[gliner]"firewall = create_firewall("healthcare", detector_backend="gliner")
Customize: define your own entity labels
sector-specific
Transformers
100–500 ms
  • Eg.: Biomedical NER (d4data, BC5CDR)
  • Highest accuracy on specific domains
  • GPU acceleration
  • HuggingFace catalog
Best for: Clinical NLP where biomedical entity accuracy is critical.
Create firewall
Transformers
pip install "pii-firewall[transformers]"firewall = create_firewall("healthcare", detector_backend="transformers", transformer_model_id="d4data/biomedical-ner-all")
Customize: transformer_model_id="..."
token-level
OpenAI Privacy Filter
50–200 ms
  • Token-level PII classifier
  • Language-agnostic
  • Structured output with spans
  • Viterbi decoding
Best for: Environments needing a lightweight yet precise token classifier without Presidio.
Create firewall
OpenAI Privacy Filter
pip install "pii-firewall[opf]"firewall = create_firewall("healthcare", detector_backend="opf")
Customize: swap in a different token classifier
max coverage
Hybrid
50–250 ms
  • Regex + Presidio combined
  • Maximum entity coverage
  • Catches structured IDs NER misses
Best for: High-risk domains where maximum recall matters more than latency.
Create firewall
Hybrid
pip install "pii-firewall[presidio,langdetect]"firewall = create_firewall("healthcare", detector_backend="hybrid")
Customize: regex rules + custom_recognizers=[...]
bring your own
Custom
your choice
  • Custom regex rules
  • Custom Presidio recognizers
  • Custom HF / Transformer models
Best for: Teams that want to extend a preset instead of starting from zero.
Create firewall
Custom
pip install "pii-firewall[presidio,transformers]"firewall = create_firewall("generic", detector_backend="presidio", custom_recognizers=[...])
Customize: add_custom_regex(...) · custom_recognizers=[...] · transformer_model_id="..."
Disposition Actions

6 ways to handle PII

Each entity type in a domain profile gets a disposition — the precise transformation applied when that entity is detected.

ActionResultReversible
KEEPPass through unchangedN/A
PSEUDONYMIZEReplace with reversible tokenYes
REDACTIrreversible deletion/redactionNo
GENERALIZEKeep category, drop precisionNo
MASKPartial revealNo
HASHSHA-256 one-way hash (analytics only)No
Showcase convention: PSEUDONYMIZE emits reversible placeholders (scope follows profile.token_scope), REDACT is irreversible deletion ([REDACTED]), GENERALIZE uses category buckets (for example, [AGE_40-49]), and HASH is shown for analytics workflows (not used in built-in presets).
Integrations

Drop into your stack in minutes

Pick your ecosystem, paste the wrapper, and ship. Provider, framework, and transport examples are tuned for fast copy-paste.

OpenAI
Chat CompletionsPython
from openai import OpenAI
from privacy_firewall import create_firewall

client = OpenAI()
firewall = create_firewall("healthcare", detector_backend="presidio")
context = {"tenant_id": "acme", "case_id": "c1", "thread_id": "t1", "actor_id": "u1"}

def openai_llm(prompt: str) -> str:
    resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.choices[0].message.content

# PII is stripped before reaching OpenAI, restored in the reply
result = firewall.secure_call(text=user_input, context=context, llm_client=openai_llm)
print(result.final_text)  # real names restored

No SDK lock-in — wrap any HTTP LLM endpoint, local model with Ollama, or LiteLLM proxy the same way.

Language Support

55+ languages, zero configuration

Language detected automatically per message with thread-level caching — zero latency after the first call. Locale-specific patterns ensure country IDs and formats are recognized correctly.

Auto-detect or force a language:
# Auto-detect (default)
firewall = create_firewall("healthcare")

# Force Spanish
firewall = create_firewall("healthcare", language="es")
ES
Spanish
DNI, NIE, IBAN-ES, phone
EN
English
SSN, EIN, ZIP, US phone
FR
French
INSEE, SIREN, phone
DE
German
Steuernummer, IBAN-DE
IT
Italian
Codice Fiscale, phone
PT
Portuguese
NIF, NIS, phone
55+
Others
Global patterns & auto-detect
GDPR Compliance

Built for regulatory compliance

GDPR Art. 17 right to erasure, tenant isolation, and full audit traces — out of the box.

Right to Erasure (Art. 17)

firewall.forget() wipes all vault mappings for a thread or case. After deletion, rehydration will not restore any original values for that scope.

deleted = firewall.forget(
    tenant_id="hospital-001",
    case_id="patient-123",
    thread_id="consultation-1",
)
# -> 14 mappings deleted

Tenant Isolation

Token mappings are scoped by tenant_id. The same [PERSON_001] token in different tenants never shares a mapping. Hard isolation at the data layer.

context = {
    "tenant_id": "hospital-001",  # hard boundary
    "case_id":   "patient-123",
    "thread_id": "consultation-1",
    "actor_id":  "doctor-456",
}

Audit Trail

Every call produces a TraceRecord with entity types, confidence scores, and applied replacements — ready for your compliance dashboard or SIEM.

result.trace.detected_entities
# [{type:"PERSON", text:"Ana Garcia",
#   confidence: 0.97}, ...]

result.trace.replacements
# [{"original":"Ana Garcia",
#    "token":"[PERSON_001]"}, ...]
Quick Start

Up and running in minutes

Three steps from install to your first anonymized LLM call.

1

Install the package

pip install "pii-firewall[presidio,langdetect]"

# Download a spaCy language model
python -m spacy download en_core_web_sm  # English
python -m spacy download es_core_news_sm # Spanish
💡 Tip: Use [all] for all backends or just pip install pii-firewall for regex-only (no ML deps).
2

Create a firewall

from privacy_firewall import create_firewall

# Pick a domain profile
firewall = create_firewall("healthcare")  # or "finance", "legal", "generic"
3

Anonymize & rehydrate

context = {
    "tenant_id": "hospital-001",
    "case_id":   "patient-123",
    "thread_id": "consultation-1",
    "actor_id":  "doctor-456",
}

# Anonymize before sending to LLM
anon = firewall.anonymize(text=user_input, context=context)
llm_response = my_llm(anon.sanitized_text)

# Rehydrate — restore real names in the response
final = firewall.rehydrate(text=llm_response, context=context)
print(final)  # End-user sees real values
💡 Tip: Or use firewall.process() for a single-call anonymize→LLM→rehydrate round-trip.

More examples

As a FastAPI microservice
from fastapi import FastAPI
from privacy_firewall import PrivacyFirewallSDK

app = FastAPI()
sdk = PrivacyFirewallSDK.create(domain="healthcare", detector_backend="presidio")

@app.post("/privacy/sanitize")
async def sanitize(req: dict):
    result = sdk.anonymize_text(text=req["text"], context=req["context"])
    return {"sanitized_text": result.sanitized_text}
Custom regex entity at runtime
firewall.add_custom_regex(
    entity_type="EMPLOYEE_ID",
    regex=r"\bEMP-\d{6}\b",
    locales=["GLOBAL"],
    confidence=0.95,
    context_words=["employee", "staff"],
    disposition_action="redact",
)
GDPR — forget a thread
deleted = firewall.forget(
    tenant_id="hospital-001",
    case_id="patient-123",
    thread_id="consultation-1",
)
print(f"Deleted {deleted} mappings")
# Art. 17 GDPR right to erasure satisfied