FAQ

Frequently Asked Questions

40 questions about legal document redaction suite โ€” answered with data.

Zero-Knowledge Authentication

How do I verify a SaaS vendor uses true zero-knowledge encryption and cannot access my data?

Argon2id key derivation runs entirely in the browser/app (64MB memory, 3 iterations). AES-256-GCM encryption happens before any data leaves the device. The server never receives the plaintext password or the derived encryption key. Even a full anonym.legal server breach would yield only encrypted blobs without the keys to decrypt them. Example: A compliance officer at a German health insurer needs to process patient complaint logs using a cloud anonymization tool. GDPR Article 32 requires appropriate technical measures. The insurer's DPO will not approve any tool that transmits unencrypted PII or holds encryption keys server-side. Zero-knowledge architecture removes this blocker from the vendor assessment process entirely.

My company processes PHI โ€” can we use cloud anonymization tools or do we need on-premise only?

Zero-knowledge design means original text is never stored on anonym.legal servers. European data storage (Hetzner EU data centers). The tool processes anonymization logic without retaining the source documents. This removes the primary blocker for HIPAA-covered entity adoption. Example: A hospital system's IT security team is evaluating tools for clinical documentation anonymization before sharing with a research partner. The HIPAA Privacy Officer needs to demonstrate compliance under 45 CFR 164.514. anonym.legal's zero-knowledge architecture means the BAA covers a tool that provably cannot expose PHI.

SaaS breaches are up 300% โ€” how can I trust any cloud tool with PII?

Zero-knowledge architecture means a full anonym.legal server compromise provides attackers with AES-256-GCM ciphertext without the keys to decrypt it. Combined with EU-based data storage and ISO 27001 controls, this provides the strongest possible breach impact minimization. Example: A CISO at a German insurance company is reviewing their 2025 vendor risk posture after the industry-wide SaaS breach surge. They require all PII-handling vendors to demonstrate cryptographic data isolation. anonym.legal's zero-knowledge design is included in the approved vendor list specifically because a server breach cannot expose policyholder data.

How do I know the PII anonymization tool I'm using isn't storing my sensitive data on their servers where it could be breached?

Argon2id (64MB memory, 3 iterations) key derivation runs entirely in the browser/desktop client. The derived AES-256-GCM key never leaves the device. anonym.legal servers receive only encrypted ciphertext and cannot decrypt it even with full database access. 24-word BIP39 recovery phrase enables key recovery without server involvement. Example: A CISO at a German health insurer evaluating anonymization tools for GDPR compliance. Their procurement checklist requires proof that the vendor cannot access patient data. anonym.legal's zero-knowledge architecture satisfies Article 25 (Privacy by Design) and allows the CISO to tell the DPA: "even if the vendor is breached, our data is cryptographically inaccessible."

After the LastPass breach, can I trust any cloud service with my company's sensitive data?

Zero-knowledge authentication with open architecture documentation. The 24-word BIP39 recovery phrase is the only way to restore access, meaning even anonym.legal staff cannot reset accounts or access user data. Session management with remote logout prevents persistent access after device loss. Example: A CISO at a 500-person law firm is reviewing vendor security after their password manager vendor suffered a breach. They need to demonstrate to their malpractice insurer that all tools handling client data use verified zero-knowledge architecture. anonym.legal's client-side encryption approach allows the CISO to demonstrate that even a complete server compromise would not expose client communication data.

How do I pass a security questionnaire for a vendor that handles our sensitive documents?

Zero-knowledge authentication + ISO 27001 certification provides the strongest possible answer to VSQ encryption questions. anonym.legal can truthfully state that server compromise yields no usable plaintext data. Example: A Fortune 500 financial services company is adding anonym.legal to their approved vendor list. Their vendor risk team sends a 150-question security questionnaire. The zero-knowledge architecture allows the anonym.legal team to answer encryption, key management, and data access questions definitively, shortening the approval cycle from months to weeks.

How do we pass vendor security assessments faster without sharing our encryption architecture documentation every time?

ISO 27001 certification provides the baseline framework. Zero-knowledge architecture documentation answers the specific question of server-side data access. DPIA completion satisfies GDPR Article 35 requirements. The combination dramatically shortens procurement cycles for regulated industries. Example: A procurement officer at a Fortune 500 financial services firm needs to onboard an anonymization tool for their data science team within Q4. anonym.legal's ISO 27001 certificate + zero-knowledge architecture documentation + completed security questionnaire template allows the CISO to approve the vendor without a full custom assessment โ€” saving 6-8 weeks.

Office Add-in (Word & Excel)

The DOJ's Epstein files showed that PDF black-box redaction can be reversed with copy-paste โ€” are Word documents safer?

Office Add-in performs true PII replacement within the Word document itself. Text is permanently replaced with tokens, redacted marks, or anonymized placeholders. The original text is not hidden โ€” it is gone from the document. Formatting (fonts, styles, bold, italic) is preserved. Headers, footers, and comments are processed. Full undo support for iterative review. Example: A government agency's legal team must produce 3,000 documents in response to a litigation hold. Previous productions using PDF black-highlighting were challenged when opposing counsel discovered the highlighting was reversible. anonym.legal's Word Add-in is deployed for the document review team. True text replacement ensures no underlying data remains. The production withstands forensic examination.

Our legal team spends 2-3 days manually redacting Word documents for each discovery production โ€” is there a faster way?

Word Add-in works natively inside Microsoft Word โ€” no conversion required. Preserves all formatting: fonts, styles, bold, italics, tables, headers, footers, footnotes, and comments. Supports per-entity operator configuration (different handling for names vs. SSNs vs. dates). Full undo support for iterative review. Reduces 2-3 days of manual work to hours. Example: A litigation boutique law firm handles 15 major matters annually, each requiring 5,000-50,000 document productions. Manual redaction was costing $400,000/year in paralegal and associate time. anonym.legal's Word Add-in reduces redaction time by 85%, saving $340,000 annually. The attorneys retain control through the review and approval workflow.

We need to anonymize Excel spreadsheets with 100,000 rows of employee data โ€” does existing redaction software handle structured data?

Excel Add-in processes spreadsheets natively. Cell-level PII detection across all visible and hidden sheets. Handles up to 100,000 rows per plan. Preserves spreadsheet structure and formulas. Per-entity configuration allows different handling for names (replace with pseudonym) vs. SSNs (replace with X's) vs. phone numbers (mask with partial display). Example: A German manufacturing company's HR department must share 50,000 employee records with an external compensation consultant. GDPR requires anonymization before sharing with third parties. The Excel file contains 37 columns including names, salaries, addresses, and performance ratings. anonym.legal's Excel Add-in processes the full dataset in minutes, anonymizing all PII fields while preserving the spreadsheet structure for analysis.

How do I redact sensitive data in Word documents without destroying the formatting?

Word Add-in works natively inside Microsoft Office. No export or conversion. Formatting is preserved at the paragraph, character, and style level. Bold names remain bold after anonymization. Table structures are preserved. Headers and footers are processed without disrupting page layout. The result is a properly formatted document ready for immediate use. Example: A UK law firm specializing in employment tribunals must produce witness statements with names and identifying information anonymized per court order. Previous attempts using PDF redaction tools destroyed the document formatting, requiring manual reconstruction. anonym.legal's Word Add-in preserves formatting exactly โ€” the anonymized statement looks professionally formatted and is court-ready without additional work.

FOIA requests requiring redaction of thousands of Word documents are creating backlogs โ€” what automation tools help?

Office Add-in processes Word documents natively with automation support. Batch processing (1-5,000 files via Desktop App) enables volume handling. Per-entity configuration allows agency-specific redaction rules (FOIA exemption B6 for personal information, B7 for law enforcement). Presets allow FOIA staff to apply consistent configurations across the entire request. Example: A federal agency's FOIA office receives a request for 8,000 Word documents related to a policy decision. With 5,638 FOIA staff processing 1.5 million requests annually (about 266 requests per staff member per year), each staff member has roughly one day per request. anonym.legal's batch-capable Word Add-in processes all 8,000 documents in hours, with human review focused on edge cases rather than every document.

What Word redaction tools preserve styles, tables, and tracked changes during PII removal?

The Office Add-in operates directly within the Word document object model โ€” no conversion to intermediate format. PII entities are detected in text runs, paragraphs, headers, footers, footnotes, and comments. Anonymization is applied in-place with full formatting preservation. Ctrl+Z undo reverts any change. This is architecturally distinct from all redaction tools that work at the rendered-document level. Example: A partner at a 50-person law firm needs to redact a 200-page merger agreement before sharing with regulatory authorities. The document contains 15 defined terms that include party names, 47 cross-references to those defined terms, and tables with financial figures linked to party identities. anonym.legal's Office Add-in detects all name instances (including in defined term contexts), applies consistent pseudonymization, and preserves all formatting โ€” reducing a 6-hour manual redaction task to 15 minutes.

How do I anonymize PII in Excel spreadsheets that have thousands of rows of customer data without losing the structure?

The Office Add-in processes Excel at the cell level, supporting up to 100,000 rows and 20MB files. Per-entity operator configuration allows different handling for different entity types within the same spreadsheet. The full undo capability allows recovery if a formula column is accidentally flagged. Example: A data analyst at a retail company preparing customer purchase history for an external marketing analytics vendor. The 50,000-row Excel file contains customer names, emails, and loyalty IDs alongside purchase amounts and product categories. anonym.legal's Excel add-in replaces names and emails with pseudonyms while hashing loyalty IDs for referential integrity โ€” allowing the analytics vendor to track behavior patterns without accessing real identities.

Desktop Application (Offline Processing)

We have air-gapped workstations for classified work โ€” is there a PII anonymization tool that works completely offline?

Desktop App built on Tauri 2.0 + Rust processes everything locally. After initial installation, no internet connection is required. All NLP models are embedded. The encrypted local vault stores configuration and presets. No data leaves the device at any point. Available on Windows, macOS, and Linux. Example: A defense contractor processing ITAR-controlled technical documents needs to anonymize them before sharing with a foreign partner under a license exception. All processing must occur on cleared workstations with no internet access. anonym.legal's Desktop App is installed on the air-gapped workstations, processes the documents locally, and produces ITAR-compliant anonymized outputs without any network connectivity.

GDPR data sovereignty rules say our data can't leave Germany โ€” how do we use cloud tools without violating this?

Desktop App processes all data locally. Nothing leaves the device. For organizations that also need cloud features, anonym.legal's web platform uses EU-based Hetzner data centers with zero-knowledge architecture. The Desktop App serves organizations with the strictest local-only requirements. Example: A German federal government agency must anonymize citizen complaint data before sharing with an external research institute. BfDI guidance prohibits processing on non-government infrastructure. anonym.legal's Desktop App runs on agency workstations โ€” all processing is local, no data traverses external networks, and the audit log is maintained in the local encrypted vault.

Our hospital's cybersecurity team won't approve any cloud-based PHI processing tools โ€” what desktop alternatives exist?

Desktop App provides cloud-quality anonymization (Presidio-based NLP with 48 languages and 260+ entity types) in a locally-installed application. No cloud connectivity required. Healthcare-specific entity types (MRN, NPI, DEA, health plan IDs) included. All 18 HIPAA Safe Harbor identifiers supported. Example: A mid-size regional hospital's clinical informatics team wants to create a research-ready dataset from their EHR. The CISO refuses to approve cloud processing of PHI. anonym.legal Desktop App is deployed on clinical informatics workstations. The team processes de-identified notes locally with the same accuracy as cloud tools, satisfying both security requirements and research quality requirements.

We need to batch-process 5,000 documents locally without uploading them to any cloud โ€” is that possible?

Desktop App batch processing supports 1-5,000 files per batch depending on plan. Parallel execution (1-5 concurrent files) for throughput. Mixed format support in a single batch. ZIP packaging for processed files. CSV/JSON export with processing metadata. Progress tracking and error handling. Example: A clinical research organization is building a de-identified dataset from 50,000 patient consultation notes. The hospital's IRB requires that processing occur on-site. anonym.legal's Desktop App processes the notes in 10 batches of 5,000, running overnight. The next morning, 50,000 de-identified files and a processing metadata log are ready for transfer to the research team.

How do I anonymize documents on a trading floor where data cannot leave the internal network?

Desktop App works completely offline after installation. Finance-specific entity types (IBAN, SWIFT, BIC, account numbers, routing numbers, cryptocurrency addresses) are pre-built. Batch processing handles volume. Encrypted local vault stores configurations and presets securely on-device. Example: A proprietary trading firm's compliance team must submit anonymized trade reports to a financial regulator. Reports contain client account numbers, trader names, and position sizes. All workstations have external internet blocked. anonym.legal's Desktop App processes reports locally, replaces client IDs with tokens, and produces regulator-ready outputs without external connectivity.

We have a fully air-gapped network and cannot use any cloud-based tools. What PII anonymization options exist for air-gapped deployments?

The Tauri 2.0-based Desktop Application runs entirely offline after download. No network calls are made during processing. The local encrypted vault (AES-256-GCM + Argon2id) stores configurations and encryption keys without cloud sync. Batch processing supports 1-5,000 files depending on plan tier. All processing occurs on local hardware โ€” no data ever leaves the device. Example: A data scientist at a defense contractor needs to de-identify personnel records before sharing with a FOIA-requesting journalist. The contractor's network is air-gapped under ITAR requirements. anonym.legal's Desktop App runs on the air-gapped machine, processes the DOCX files in batch, and produces redacted documents โ€” all without any external network communication.

Our legal team says patient data cannot leave our premises under any circumstances. What tools work completely locally?

The Desktop Application architecture (Tauri 2.0 + Rust) has been independently verified to make no network calls during document processing. The local vault stores all configuration and keys. Processing the Presidio sidecar runs entirely on the local machine. This architecture can be verified by network monitoring tools during security assessment. Example: A compliance officer at a Swiss private bank needs to anonymize client correspondence before sharing with an external auditor. Swiss banking secrecy law (Article 47 Banking Act) prohibits disclosure of client information to unauthorized parties, including cloud service providers not covered by explicit consent. anonym.legal's Desktop Application processes the correspondence locally, producing anonymized documents that can be safely shared with the auditor without triggering banking secrecy obligations.

Reversible Encryption (UNIQUE Tokens)

We anonymized documents for sharing, but now legal needs the originals for discovery โ€” how do we get them back?

AES-256-GCM reversible encryption preserves the mathematical relationship between the anonymized token and the original value. With the client-held encryption key, any anonymized document can be fully restored to its original content. Without the key, the anonymized version is computationally indistinguishable from a permanently redacted document. Legal teams share encrypted versions; produce originals when required using the retained key. Example: A pharmaceutical company shares clinical trial data with external statisticians using anonym.legal's encrypted anonymization. Two years later, the FDA requests original patient records as part of a drug safety review. The company restores the original data using their retained encryption key โ€” no spoliation, no missing records, full regulatory compliance. The statisticians' encrypted copies remain protected throughout.

We de-identified patient data for research, but now need to contact specific patients based on research findings โ€” how?

Reversible encryption creates a protected pseudonymization layer. The research dataset uses encrypted tokens. The decryption key is held by the designated data custodian. When re-contact is clinically justified and IRB-approved, the custodian decrypts the specific participant records to enable follow-up. The broader dataset remains protected โ€” only the specific authorized decryption is performed. Example: A European oncology research center conducts a 5,000-patient study using anonym.legal's encrypted anonymization. Mid-study analysis reveals a subgroup of 47 participants showing markers for an aggressive cancer variant. The ethics committee approves re-contact. The data custodian uses the retained encryption key to identify the 47 real patients. Those patients are contacted, 23 are found to have actionable findings. The remaining 4,953 participants' data remains fully protected.

We anonymized documents to share with outside counsel, but now we need to produce the originals in discovery. How do we recover the original data?

Reversible encryption using AES-256-GCM generates deterministic encrypted tokens from original PII. The key is held only by the user. "John Smith" becomes "[ENC:x9f3a...]" consistently throughout the document โ€” maintaining referential integrity. When authorized de-anonymization is needed (discovery production, audit verification, research follow-up), the user applies their key and all tokens restore to originals. The Chrome Extension auto-decrypts AI responses, so working with encrypted data is transparent in the AI workflow. Example: A compliance officer at a pharmaceutical company shares clinical trial data with a contract research organization (CRO). All patient identifiers are encrypted with a company-held key. The CRO analyzes anonymized data. When the FDA requests original patient records for audit, the compliance officer applies the key and produces the originals in minutes โ€” with a cryptographic audit trail proving chain of custody.

Our external auditors need to verify the original data behind our redacted financial reports โ€” how do we handle this?

Reversible encryption allows selective de-anonymization. The finance team shares encrypted anonymized reports. Auditors working under formal engagement can be given decryption capability for their audit period. After audit completion, the key can be rotated โ€” previous encrypted copies remain protected, auditors cannot retroactively access records outside their engagement. Example: A private equity firm shares portfolio company financial data with an external audit firm for annual review. Client company names and deal terms are encrypted before sharing. During audit, the engagement partner receives temporary decryption access for the audit period. After the audit opinion is issued, key rotation removes that access. Former employees of the audit firm cannot access the data after their tenure.

Anonymous employee surveys revealed a serious harassment allegation โ€” we need to follow up but can't identify who filed it. What should we do?

Reversible encryption allows HR to run "conditionally anonymous" surveys. Responses are encrypted before storage. The decryption key is held by a designated HR executive (or third-party ombudsman). When a response contains a serious allegation meeting predefined criteria (e.g., physical harassment, legal violations), the authorized party can decrypt that specific response to identify the reporter and initiate formal investigation. Example: A 2,000-employee manufacturing company's annual culture survey captures an allegation of serious misconduct by a senior executive. The response is encrypted. The company's third-party ombudsman reviews the allegation and determines it meets the threshold for de-anonymization under the company's published survey policy. The ombudsman decrypts the specific response, contacts the reporter through a formal protected channel, and initiates an independent investigation. All other responses remain permanently anonymized.

We use AI to process customer queries but need to restore original names for the final response โ€” how does token mapping work across AI interactions?

Session-based token mapping maintains consistent anonymization within a conversation. The same customer name always maps to the same token within a session. Auto-decrypt in Chrome Extension responses restores real names in AI outputs before display. Persistent token mapping is also available for longer-lived workflows. Example: A German insurance company's AI-powered claims processing system processes customer complaint emails. Customer names, policy numbers, and claim amounts are anonymized before Claude processes the emails. Claude drafts a response using the anonymized tokens. anonym.legal's auto-decrypt restores original customer information in Claude's draft before it is displayed to the claims handler. The handler sends the final response with real customer names. GDPR compliance is maintained throughout.

We de-identified patient data for a research study. Now we need to re-contact participants for a follow-up. How do we identify them?

Reversible encryption generates consistent tokens (deterministic AES-256-GCM) โ€” "Patient_001" maps to the same encrypted token throughout all study records. The research team holds the key. Re-identification for follow-up requires the key holder to decrypt. All decrypt events are logged. This satisfies both the IRB requirement for controlled re-identification capability and the HIPAA Safe Harbor requirement for de-identified data sharing.

Multi-Format Document Support

PDF redaction is a specific problem โ€” tools that just put a black box over text aren't truly redacting it, the text is still there in the PDF layer. How do we ensure true redaction?

PDF redaction removes detected PII from the document's text layer, not just applies a visual overlay. The redacted output PDF contains no underlying text for the anonymized entities โ€” only the visual redaction marks. This provides genuine, court-admissible redaction rather than cosmetic redaction. The difference is verifiable: a text extraction tool applied to an anonym.legal-redacted PDF will return empty strings for redacted regions. Example: A government agency's legal department was filing court documents with "redacted" PII that opposing counsel could extract via copy-paste โ€” the same technique that exposed the DOJ Epstein documents. After discovering this vulnerability, they switched to anonym.legal for all court filing preparation. Verification protocol: every redacted document is text-extracted before filing to confirm no underlying PII remains. Zero copy-paste PII exposures since adoption.

We have PII spread across Word documents, PDFs, Excel spreadsheets, and CSV exports. We've been using different tools for each format โ€” it's a mess. Is there one tool that handles all of them?

Seven formats natively supported in a single interface with a consistent engine. The same 260+ entity types and same preset configurations apply whether the document is a PDF contract, XLSX customer list, or JSON API log export. Batch processing handles mixed-format sets. Single audit trail across all formats. One tool replaces four or five format-specific workarounds. Example: A HR consultancy processes employee data in four formats: job application PDFs, interview notes in DOCX, compensation data in XLSX, and onboarding system exports in CSV. They previously used 3 separate tools for these formats, with different entity coverage and no cross-format consistency. Migrating to anonym.legal, all four formats process through one interface with the same "HR Data GDPR" preset. Anonymization consistency improved; tool licensing cost reduced by 60%.

We have XLSX spreadsheets with PII scattered across hundreds of columns and rows โ€” phone numbers in one column, names in another, SSNs mixed with account numbers. How do we anonymize these efficiently?

Native XLSX support with cell-level PII detection that uses column headers as context signals. A column labeled "SSN" with values matching partial patterns is detected as SSN context even for edge-case values. Multi-sheet processing applies the same configuration across all sheets. Output preserves Excel formatting while anonymizing PII cell values. Column structures, formulas, and non-PII data are preserved. Example: An HR department receives employee records from an acquired company: a 15,000-row XLSX with 40 columns including employee IDs, names, SSNs, salaries, performance scores, and manager names. Anonymizing for sharing with an external HR consultant requires removing personal identifiers while preserving the statistical structure. anonym.legal processes the full XLSX with the "HR GDPR" preset: names, SSNs, email addresses, and phone numbers anonymized cell-by-cell while salary data, performance scores, and department codes are preserved. Processing time: 8 minutes vs. estimated

Our application logs contain user data in JSON format โ€” API logs with user IDs, email addresses, and IP addresses mixed with technical fields. How do we anonymize logs for debugging without removing too much context?

Native JSON support with nested structure traversal detects PII at any depth within JSON documents. Email addresses, IPs, names, and other entities are detected by content, not path โ€” so the same configuration works across variable log schemas. Technical metadata (timestamps, error codes, stack traces, technical IDs) is preserved. The Replace method substitutes PII with consistent fake values, preserving referential integrity within log files (the same user email replaced with the same fake email across all log entries). Example: A SaaS company shares application logs with an external penetration testing firm. Raw logs contain 4,200 unique user email addresses and IP addresses. anonym.legal processes 180MB of JSON logs in batch, replacing all email addresses with consistent fake addresses (user1@example.com, user2@example.com) and IP addresses with anonymized IPs. The pen test firm receives logs with full technical context but zero real user data. GDPR compliance for third-party data s

We need to share research data in CSV format with a university partner. The CSV contains survey responses with PII mixed into free-text fields. Are there tools that can detect PII in CSV free-text columns?

CSV processing applies entity detection to every cell, including free-text columns, using the same NLP + transformer stack as document processing. PII entities discovered in free-text survey responses ("My name is John and I work at IBM") are detected and replaced while the surrounding context ("I feel that the new policy...") is preserved. Structured columns with PII headers are also cleaned. The result is a genuinely anonymized CSV that maintains research utility. Example: A research consortium at three European universities shares a 5,000-row survey CSV about patient experiences. Free-text columns contain incidental names, hospital references, and location details that would identify individual respondents. anonym.legal processes the CSV: 47 free-text PII entities detected and anonymized across the free-text columns, structured PII columns (name, email, birth date) cleaned. The anonymized CSV is shared between institutions in compliance with GDPR Article 89 (research exemption requi

Our e-discovery production includes PDFs, Word documents, Excel spreadsheets, and email exports. We need different tools for each โ€” how do we unify this?

Batch processing supports PDF, DOCX, XLSX, TXT, CSV, JSON, and XML in a single batch run. The same Presidio-based detection engine operates across all formats. Output is format-consistent regardless of input type. This eliminates the need for format-specific tools and ensures consistent detection across a mixed-format document production.

Our application logs contain customer PII in JSON format. How do we mask sensitive fields before sending logs to our analytics platform?

JSON and XML processing handles nested structure natively โ€” PII detection operates on string values within the document model, not on the raw file bytes. Processing preserves document structure, only modifying PII-containing string values. Batch processing integrates into log rotation pipelines.

Text-Based Image PII Detection

We have thousands of scanned contract PDFs โ€” they're image-based PDFs with no text layer. Standard PDF PII tools can't detect anything. How do we process scanned documents?

The text-in-image detection feature integrates OCR with NLP in a single processing pipeline. Image-based PDFs and image files (PNG, JPG) containing scanned text are processed through OCR to extract text, then through the full 260+ entity NLP pipeline for PII detection. The anonymized output is the extracted text with PII replaced, redacted, or encrypted. Batch processing handles large legacy document archives. Example: A law firm undertaking a GDPR data audit discovers 80,000 image-based PDF client contracts scanned between 1998-2010. Standard PII tools return zero detections. Using anonym.legal's text-in-image processing, the firm processes the archive in batches of 5,000. OCR extracts text from each image-PDF, NLP detects client names, addresses, ID numbers, and financial references, and the anonymized text output enables the firm to fulfill right-to-erasure requests for the historical archive. Previously impossible compliance obligation fulfilled.

Our support team takes screenshots and shares them internally โ€” these screenshots often contain customer data. How do we detect and remove PII from screenshots before sharing?

Image PII detection processes PNG and JPG screenshots, applying OCR to extract visible text and NLP to detect PII entities in the extracted text. The anonymized output reports which entities were found in the screenshot content. Users can clean screenshots before sharing them internally or with external parties. Particularly useful for Jira/ServiceNow ticket documentation, internal wiki screenshots, and contractor-facing technical documentation. Example: A SaaS company's IT help desk creates Jira tickets with screenshots of user account problems. Screenshots contain user email addresses, subscription details, and billing information. After a GDPR review found that screenshots in Jira were accessible to all 200 engineering staff (including contractors without DPAs), the company implemented anonym.legal image scanning as a pre-sharing step. Support agents scan screenshots before attaching to tickets; PII-detected screenshots go through a quick anonymization review. Internal PII exposure

We receive forms filled out by hand and scanned โ€” job applications, patient intake forms, insurance claims. The scanned images contain handwritten PII. Is there a way to automatically detect and redact it?

Text-in-image processing includes OCR for both printed and handwritten text extraction. For handwritten forms, OCR extracts the text content, NLP detects PII entities, and the anonymization is applied to the extracted text output. Quality depends on OCR accuracy for handwriting (an inherent technical limitation), but for reasonably legible handwriting, the integrated pipeline provides practical automation for high-volume form processing at fixed subscription cost. Example: A regional health insurance provider processes 3,000 handwritten claim forms per month. Manual PII redaction for audit purposes requires 0.5 FTE (20 hours/week). anonym.legal's image PII processing reduces manual review to exception handling for low-OCR-confidence forms โ€” approximately 15% of volume. Manual review drops to 3 hours/week. Annual labor saving: approximately โ‚ฌ24,000. Annual anonym.legal Professional plan: โ‚ฌ180. ROI: 133x.

Employees share photos of whiteboards and printed materials in our collaboration tools. These often contain customer names and project details written on the whiteboard. How do we handle this type of PII?

Image text detection processes photographs of whiteboards and physical documents, applying OCR to extract visible text and NLP to detect entities. Users can upload whiteboard photos before sharing them in collaboration tools to get a PII assessment. The output identifies any detected PII entities in the image's text content, enabling users to either anonymize the sharing (describe what's on the whiteboard without the specific PII) or limit sharing scope appropriately. Example: A management consulting firm's engagement team photographs client strategy session whiteboards to share with remote team members. After a client raised concerns about their company data appearing in the consulting firm's Slack channels, the firm implemented an anonym.legal image review step for all whiteboard shares. Images are processed before posting; images containing client names or financial figures trigger a review step. One month post-implementation, the client concern was formally resolved with a document

We publish research papers and reports that contain screenshots of data analysis tools โ€” these screenshots sometimes show individual-level data. How do we check images before publication?

Image text detection processes screenshots embedded in research documents, extracting text from images in the manuscript and applying PII detection. Researchers can process their draft documents before submission; journal editors can screen final manuscripts before publication. The pipeline identifies which images contain detectable PII entities, enabling targeted replacement of problematic screenshots with properly anonymized sample data before the privacy violation becomes permanent. Example: A data science research group at a European university implements anonym.legal image PII screening as part of their manuscript submission workflow. All draft papers are processed for image PII before submission to journals. In the first 6 months, 7 of 23 submitted manuscripts had at least one image containing PII entities (typically names or IDs in data sample screenshots). All 7 were corrected before submission. The institution's research ethics committee uses this workflow as evidence of appro

Also from anonym.legal: anonymize.legal ยท blurgate.eu ยท privacyhub.legal ยท anonym.company ยท anonym.digital ยท anonym.management ยท anonym.marketing ยท anonym.agency

Published by George Curta, Founder of anonym.legal ยท