Data Architecture for Regulated Industries: HIPAA, SOX, FedRAMP, PCI in 2026
by Green Dolphin Software, Data architecture practice

The data platform decision in a regulated industry is not the same decision as in a tech startup. Snowflake-vs-Databricks blog posts written for the SaaS world assume your worst-case audit is your investor board asking why CAC is up. In HIPAA, SOX, FedRAMP, or PCI environments, your worst-case audit is a federal agency, a state attorney general, or a CISO escalation that ends careers.
This post is the data architecture playbook we use on $25K+ Data Architecture engagements for clients in regulated industries — healthcare, financial services, public sector, defense, and payments. Vendor-neutral, no kickback agreements with any platform.
What "regulated" actually means for data architecture
The compliance regimes overlap less than buyers think. The architecture decisions are different per regime.
HIPAA (healthcare)
- Protected Health Information (PHI) cannot leave a Business Associate Agreement (BAA) boundary
- Every BAA-covered platform must support audit logging, encryption at rest + in transit, access controls
- De-identification (Safe Harbor or Expert Determination) is the only way to use PHI outside the BAA boundary
SOX (financial reporting)
- Any data flowing into financial statements is in scope — usually GL, AR, AP, revenue recognition data
- Change management on the data pipeline is the audit lever — who changed the dbt model, when, with what approval
- Segregation of duties: the engineer who writes the transformation can't also approve the deploy
FedRAMP (US federal)
- Authorization at Moderate or High level required for the platform itself
- FedRAMP-authorized region of the cloud — not the default region
- Continuous monitoring + monthly POA&M reporting
PCI-DSS (payments)
- Cardholder data environment (CDE) must be network-segmented
- Tokenization or encryption of the PAN before it enters the warehouse
- Quarterly external scans + annual on-site assessment
A single platform can be the right answer for one regime and wrong for another. The architecture decision starts with: which regimes apply, where do they overlap, where do they fight.
Platform compliance posture (2026)
Snowflake
HIPAA: Business Associate Agreement available. PHI workloads supported in any region. Object-level tagging supports PHI classification + masking policies. Strong.
SOX: Time-travel + zero-copy clones are powerful change-management tools. Access History view satisfies most auditor queries about who-saw-what. Strong if dbt model changes are gated through proper PR review + approval (which is process, not platform).
FedRAMP: Snowflake Government region is FedRAMP High authorized. Separate account, separate region, slightly fewer features than commercial. Strong if you're already on AWS GovCloud or Azure Government.
PCI-DSS: Network policies + private endpoints (PrivateLink / Private Link). Strong if the CDE design is done correctly.
Databricks
HIPAA: BAA available. Unity Catalog provides fine-grained access controls + row-level + column-level masking. Lineage built-in (auditor favorite). Strong.
SOX: Unity Catalog lineage + audit logs. Workflow approvals through CI/CD. Strong if Spark job code is in version control + reviewed (again, process not platform).
FedRAMP: Databricks FedRAMP Moderate authorized on both AWS GovCloud + Azure Government. High authorization in progress as of early 2026. Moderate is sufficient for most workloads.
PCI-DSS: Workspace isolation + private networking. Newer than Snowflake's PCI story; more work in the deployment.
BigQuery
HIPAA: BAA available across all regions. Customer-managed encryption keys (CMEK) supported. Strong if you're already on GCP.
SOX: Audit logs go to Cloud Logging — solid foundation. Less native change-management tooling than Snowflake / Databricks; relies on the surrounding CI/CD pipeline.
FedRAMP: Google Cloud Assured Workloads (FedRAMP High) supports BigQuery. Limited to specific GCP regions.
PCI-DSS: Standard GCP PCI-DSS attestation applies. Network isolation via VPC Service Controls.
Microsoft Fabric / Synapse
HIPAA: BAA via Microsoft for Azure services including Fabric + Synapse. Strong, well-established.
SOX: Standard Azure audit logging + Purview lineage. Synapse has mature change-management story; Fabric still maturing.
FedRAMP: Azure Government regions are FedRAMP High. Fabric availability in Government regions lagging commercial (check current state — moving target).
PCI-DSS: Standard Azure PCI-DSS attestation. Mature.
AWS Redshift
HIPAA: BAA via AWS. Encryption via AWS KMS. Mature.
SOX: CloudTrail logging + change management depend on surrounding pipeline. No native time-travel like Snowflake.
FedRAMP: AWS GovCloud is FedRAMP High. Redshift available in GovCloud.
PCI-DSS: Mature attestation. Network isolation via VPC.
Governance tooling that actually carries weight with auditors
Auditors don't care about your data catalog screenshots — they care about whether you can answer the question "who saw this PII / PHI / cardholder data, when, why, with what approval" in 15 seconds, with evidence.
Unity Catalog (Databricks)
Built-in. Lineage from raw → silver → gold → dashboard, captured automatically. Tag-based access controls. Strong if you're on Databricks; pairs natively. Less compelling if your warehouse is Snowflake — Unity Catalog can read Snowflake but auditors prefer the native story.
Atlan
Modern catalog with strong UX. Lineage via push-down from dbt, Fivetran, Airflow, etc. Better for engineering-led teams; auditors will accept it but it's newer to the regulated-industry conversation.
Collibra
The incumbent enterprise catalog. Mature, well-known to auditors in financial services. Heavy implementation; high TCO. Strong fit if you're already in regulated FS or healthcare with an existing Collibra footprint.
Alation
Mid-market enterprise catalog. Strong in financial services + healthcare. Less heavy than Collibra; less feature-rich.
Microsoft Purview
The Azure-native answer. Catalog + DLP + insider risk in one. Tight Microsoft Fabric / Synapse integration. Mature in Azure shops.
dbt + dbt Cloud
Not a catalog per se, but dbt's manifest.json + Cloud's lineage UI is sufficient for SOX change-management in many shops. The model-level tests + ownership + descriptions become the documentation auditors review.
The architecture decision in a regulated environment
The lakehouse / medallion pattern works in regulated industries with three additions:
-
Bronze tier inside the compliance boundary — raw data lands inside the BAA / FedRAMP / PCI boundary and never leaves it. No "let's just extract a sample to my laptop for analysis."
-
Tokenization / de-identification at Silver — PHI / PAN / SSN replaced with tokens or hashed identifiers before Silver. Re-identification only via the tokenization service, which has its own audit log.
-
Gold tier with documented access controls — Gold tier is what BI tools and AI models consume. Every Gold table has explicit row-level + column-level policies. The policy definitions are version-controlled.
The AI layer (Snowflake Cortex / Databricks Mosaic AI / Azure OpenAI / Bedrock) sits on top of Gold. It does not see Bronze or Silver. Models never train on PHI / PAN unless de-identified at Silver. Embeddings of de-identified data are fine; embeddings of raw PHI are not.
How we pick
The first conversation on a $25K+ Data Architecture engagement covers:
- Which compliance regimes apply
- What's currently in place (BAAs, FedRAMP authorizations, existing CDE design)
- What the AI ambition is (which determines whether in-warehouse AI matters)
- What the team's existing skill set is (Spark vs SQL, Python vs T-SQL)
- What the procurement / security review timeline looks like
The output is a fixed-bid recommendation with rationale, a target-state diagram, and a 90-day modernization roadmap. Vendor-neutral. The recommendation is the one we'd implement on our own time.
See the broader data architecture decision framework for non-regulated context, and architecture & design services for the full $25K+ engagement structure.
If you're in a regulated industry and the platform pick is upcoming, the SOW workflow starts at /intake. Fixed-bid response in 3 business days.

