PDF Workflow Processing
Automated document processing with PHI detection, smart redaction, and secure API transmission
Overview
PDF workflows automate document processing with PHI detection, tokenization before AI invocation, output validation, and routing to external integration endpoints. The three-gate enforcement boundary tokenizes detected PHI before any AI model is invoked in either tier. The Tier 1 / Tier 2 distinction governs whether the workflow may route identifiable PHI to an external integration endpoint after processing.
Tier 1: De-identified Routing
Workflows that route only de-identified content to external integration endpoints. Detected PHI is tokenized; only validated, de-identified output is transmitted.
Tier 2: BAA-Covered Routing
Workflows that route identifiable PHI to configured external integration endpoints (FHIR, patient matching, claims) under an executed BAA. AI invocations inside the workflow continue to receive tokenized content.
Tier 1 Workflow: De-identified Processing
Tier 1 workflows tokenize detected PHI in documents before AI processing, so only de-identified content reaches an external model. Whether a Tier 1 workflow is appropriate for a given use case without a BAA is a determination your organization makes based on its own HIPAA risk assessment.
PHI Detection
AI scans document text for PHI entities (names, SSNs, dates of birth, phone numbers, etc.)
Smart Redaction
PHI replaced with tokens (TOKEN_001, TOKEN_002). Clinical terms preserved. Original values stored securely in Azure Key Vault.
AI Processing
Redacted document processed by AI (extraction, summarization, etc.). Detected PHI is replaced with tokens before the model is invoked.
Output Validation
AI output scanned for residual PHI. Workflow blocked if unexpected PHI detected.
API Transmission
Validated, de-identified data transmitted to external API endpoints securely.
Tier 1 routing: Tier 1 workflows route only validated, de-identified content to external endpoints. Whether your organization can use a specific workflow without a BAA depends on your own HIPAA risk assessment and the destination endpoint's terms.
Example Use Cases:
- Prior authorization document summarization
- Lab result extraction (de-identified)
- Clinical note categorization
- Medical record quality checks
Tier 2 Workflow: BAA-Covered Routing
Tier 2 workflows route identifiable PHI to configured external integration endpoints (FHIR, patient matching, claims systems, EHR write-back) under an executed Business Associate Agreement. The destination is the integration endpoint, not the AI model. When an AI step is included in a Tier 2 workflow, the three-gate enforcement boundary tokenizes detected PHI before model invocation, the same as in Tier 1.
Compliance Gate
Workflow validates BAA is signed, not expired, Tier 2 routing is enabled, and external routing has been acknowledged. Blocks if any check fails.
PHI Detection
Detected PHI entities are identified and recorded for downstream routing decisions and audit. The token map is held server-side.
AI Processing (if invoked)
When the workflow includes an AI step, detected PHI is replaced with tokens before the model is invoked. AI providers receive tokenized content in Tier 2, the same as in Tier 1. Workflows that do not require an AI step skip this stage.
Output Validation
Output validated for data quality and schema. Allowed PHI fields enumerate where identifiable values may appear in the payload to be routed; PHI in any other field triggers blocking.
External Routing
Identifiable PHI is reinserted from the token map and transmitted to the configured external integration endpoint over TLS 1.2+, with authentication and full audit logging.
BAA required: Tier 2 routing requires an executed Business Associate Agreement and the external routing acknowledgment. See BAA Management Guide for setup.
Example Use Cases:
- FHIR data exchange with covered partners
- Patient matching and deduplication against external services
- Insurance claim submission
- Referral coordination with identifiable data
Smart Redaction Features
Smart redaction goes beyond simple text removal to preserve clinical context while protecting PHI.
Tokenization
PHI values replaced with reversible tokens (TOKEN_001, TOKEN_002) instead of [REDACTED].
Original: "John Doe, DOB 01/15/1980" Redacted: "TOKEN_001, DOB TOKEN_002"
Clinical Term Preservation
Medical terminology retained for accurate AI processing.
Original: "Patient has diabetes" Redacted: "TOKEN_001 has diabetes" ā NOT: "TOKEN_001 has TOKEN_002"
Date Granularity Options
Preserve year or month for clinical context while redacting day.
Original: "01/15/1980" Year: "****/**/1980" Month: "**/01/1980" Full: "TOKEN_003"
Key Vault Storage
Token-to-PHI mappings stored securely in Azure Key Vault, not database.
Enables PHI reinsertion after AI processing if needed for downstream workflows.
Preserved Clinical Terms:
diabetes, hypertension, asthma, COPD, cancer, stroke, depression, anxiety, arthritis, obesity, pneumonia, sepsis, metformin, lisinopril, atorvastatin, insulin, warfarin, prednisone, albuterol, levothyroxineAnd many more medical terms...
Output Validation & PHI Leakage Prevention
Output validation is the critical safety net that prevents accidental PHI disclosure in API transmissions.
Residual PHI Scanning
AI output re-scanned for PHI entities. If unexpected PHI detected, workflow blocked immediately.
Allowed PHI Fields (Tier 2)
Specify which fields can contain PHI (e.g., patientName, patientId). PHI in other fields triggers blocking.
Schema Validation
Optionally validate output against JSON schema to ensure data structure compliance.
Risk Assessment
Generates risk level (safe/review/block) based on PHI detection confidence and location.
// Validation Configuration Example
{
"scanForResidualPHI": true,
"allowedPHIFields": ["patientId", "patientName", "dateOfBirth"],
"blockOnUnexpectedPHI": true,
"schemaValidation": {
"type": "object",
"required": ["patientId", "diagnosis"],
"properties": {
"patientId": { "type": "string" },
"diagnosis": { "type": "string" }
}
}
}š Blocking Behavior: When validation fails, the workflow stops immediately and no data is transmitted. Review audit logs to investigate the cause.
API Endpoint Configuration
Configure external API endpoints for secure data transmission with authentication, TLS enforcement, and retry logic.
Security Features:
TLS 1.2+ Enforcement
All transmissions use modern TLS versions only
Authentication Methods
Bearer token, API key, OAuth 2.0, or mTLS certificate
Credential Storage
Secrets stored in Azure Key Vault, never in database
Exponential Backoff
Automatic retries with configurable delays (up to 10 attempts)
Audit & Compliance:
- Every transmission logged with timestamp, status code, and payload hash
- Response times and retry counts tracked for monitoring
- Transmission IDs linkable to workflow execution for full audit trail
- Failed transmissions logged with error details for troubleshooting
š” Admin Configuration: API endpoints managed by site admins at /system/api-endpoints. See API Endpoint Configuration Guide.
Creating Workflows
Build custom PDF workflows using the visual workflow builder with drag-and-drop nodes.
Available PDF Workflow Nodes:
PHI Detection
Scan text for PHI entities with confidence scoring
Smart Redaction
Redact PHI with tokenization and clinical term preservation
Output Validation
Scan for residual PHI and validate data quality
API Post
Securely transmit data to external API endpoints
Compliance Gate
Check BAA status and Tier 2 eligibility