Three Gates Developer DocsDetected PHI tokenized before AI invocationDesigned to support HIPAA Security Rule obligations

PDF Workflow Processing

Automated document processing with PHI detection, smart redaction, and secure API transmission

Overview

PDF workflows automate document processing with PHI detection, tokenization before AI invocation, output validation, and routing to external integration endpoints. The three-gate enforcement boundary tokenizes detected PHI before any AI model is invoked in either tier. The Tier 1 / Tier 2 distinction governs whether the workflow may route identifiable PHI to an external integration endpoint after processing.

Tier 1: De-identified Routing

Workflows that route only de-identified content to external integration endpoints. Detected PHI is tokenized; only validated, de-identified output is transmitted.

Tier 2: BAA-Covered Routing

Workflows that route identifiable PHI to configured external integration endpoints (FHIR, patient matching, claims) under an executed BAA. AI invocations inside the workflow continue to receive tokenized content.

Tier 1 Workflow: De-identified Processing

Tier 1 workflows tokenize detected PHI in documents before AI processing, so only de-identified content reaches an external model. Whether a Tier 1 workflow is appropriate for a given use case without a BAA is a determination your organization makes based on its own HIPAA risk assessment.

PHI Detection

AI scans document text for PHI entities (names, SSNs, dates of birth, phone numbers, etc.)

Smart Redaction

PHI replaced with tokens (TOKEN_001, TOKEN_002). Clinical terms preserved. Original values stored securely in Azure Key Vault.

AI Processing

Redacted document processed by AI (extraction, summarization, etc.). Detected PHI is replaced with tokens before the model is invoked.

Output Validation

AI output scanned for residual PHI. Workflow blocked if unexpected PHI detected.

API Transmission

Validated, de-identified data transmitted to external API endpoints securely.

Tier 1 routing: Tier 1 workflows route only validated, de-identified content to external endpoints. Whether your organization can use a specific workflow without a BAA depends on your own HIPAA risk assessment and the destination endpoint's terms.

Example Use Cases:

Prior authorization document summarization
Lab result extraction (de-identified)
Clinical note categorization
Medical record quality checks

Tier 2 Workflow: BAA-Covered Routing

Tier 2 workflows route identifiable PHI to configured external integration endpoints (FHIR, patient matching, claims systems, EHR write-back) under an executed Business Associate Agreement. The destination is the integration endpoint, not the AI model. When an AI step is included in a Tier 2 workflow, the three-gate enforcement boundary tokenizes detected PHI before model invocation, the same as in Tier 1.

Compliance Gate

Workflow validates BAA is signed, not expired, Tier 2 routing is enabled, and external routing has been acknowledged. Blocks if any check fails.

PHI Detection

Detected PHI entities are identified and recorded for downstream routing decisions and audit. The token map is held server-side.

AI Processing (if invoked)

When the workflow includes an AI step, detected PHI is replaced with tokens before the model is invoked. AI providers receive tokenized content in Tier 2, the same as in Tier 1. Workflows that do not require an AI step skip this stage.

Output Validation

Output validated for data quality and schema. Allowed PHI fields enumerate where identifiable values may appear in the payload to be routed; PHI in any other field triggers blocking.

External Routing

Identifiable PHI is reinserted from the token map and transmitted to the configured external integration endpoint over TLS 1.2+, with authentication and full audit logging.

BAA required: Tier 2 routing requires an executed Business Associate Agreement and the external routing acknowledgment. See BAA Management Guide for setup.

Example Use Cases:

FHIR data exchange with covered partners
Patient matching and deduplication against external services
Insurance claim submission
Referral coordination with identifiable data

Smart Redaction Features

Smart redaction goes beyond simple text removal to preserve clinical context while protecting PHI.

Tokenization

PHI values replaced with reversible tokens (TOKEN_001, TOKEN_002) instead of [REDACTED].

Original: "John Doe, DOB 01/15/1980"
Redacted: "TOKEN_001, DOB TOKEN_002"

Clinical Term Preservation

Medical terminology retained for accurate AI processing.

Original: "Patient has diabetes"
Redacted: "TOKEN_001 has diabetes"
❌ NOT: "TOKEN_001 has TOKEN_002"

Date Granularity Options

Preserve year or month for clinical context while redacting day.

Original: "01/15/1980"
Year: "****/**/1980"
Month: "**/01/1980"
Full: "TOKEN_003"

Key Vault Storage

Token-to-PHI mappings stored securely in Azure Key Vault, not database.

Enables PHI reinsertion after AI processing if needed for downstream workflows.

Preserved Clinical Terms:

diabetes, hypertension, asthma, COPD, cancer, stroke, depression, anxiety, arthritis, obesity, pneumonia, sepsis, metformin, lisinopril, atorvastatin, insulin, warfarin, prednisone, albuterol, levothyroxine

And many more medical terms...

Output Validation & PHI Leakage Prevention

Output validation is the critical safety net that prevents accidental PHI disclosure in API transmissions.

Residual PHI Scanning

AI output re-scanned for PHI entities. If unexpected PHI detected, workflow blocked immediately.

Allowed PHI Fields (Tier 2)

Specify which fields can contain PHI (e.g., patientName, patientId). PHI in other fields triggers blocking.

Schema Validation

Optionally validate output against JSON schema to ensure data structure compliance.

Risk Assessment

Generates risk level (safe/review/block) based on PHI detection confidence and location.

// Validation Configuration Example
{
  "scanForResidualPHI": true,
  "allowedPHIFields": ["patientId", "patientName", "dateOfBirth"],
  "blockOnUnexpectedPHI": true,
  "schemaValidation": {
    "type": "object",
    "required": ["patientId", "diagnosis"],
    "properties": {
      "patientId": { "type": "string" },
      "diagnosis": { "type": "string" }
    }
  }
}

🛑 Blocking Behavior: When validation fails, the workflow stops immediately and no data is transmitted. Review audit logs to investigate the cause.

API Endpoint Configuration

Configure external API endpoints for secure data transmission with authentication, TLS enforcement, and retry logic.

Security Features:

TLS 1.2+ Enforcement

All transmissions use modern TLS versions only

Authentication Methods

Bearer token, API key, OAuth 2.0, or mTLS certificate

Credential Storage

Secrets stored in Azure Key Vault, never in database

Exponential Backoff

Automatic retries with configurable delays (up to 10 attempts)

Audit & Compliance:

Every transmission logged with timestamp, status code, and payload hash
Response times and retry counts tracked for monitoring
Transmission IDs linkable to workflow execution for full audit trail
Failed transmissions logged with error details for troubleshooting

💡 Admin Configuration: API endpoints managed by site admins at /system/api-endpoints. See API Endpoint Configuration Guide.

Creating Workflows

Build custom PDF workflows using the visual workflow builder with drag-and-drop nodes.

Available PDF Workflow Nodes:

🛡️

PHI Detection

Scan text for PHI entities with confidence scoring

🔒

Smart Redaction

Redact PHI with tokenization and clinical term preservation

✅

Output Validation

Scan for residual PHI and validate data quality

📤

API Post

Securely transmit data to external API endpoints

🔐

Compliance Gate

Check BAA status and Tier 2 eligibility

Next Steps

BAA Management

Set up Business Associate Agreement for Tier 2 routing

API Endpoint Config

Configure external API endpoints for data transmission