Data Masking & PII Quick Start Guide for HeliosDB
Data Masking & PII Quick Start Guide for HeliosDB
Version: 1.0 Last Updated: December 30, 2025 Target Audience: Database Administrators, Security Engineers, Compliance Officers Reading Time: 12-15 minutes
Table of Contents
- Understanding Data Masking & PII
- Compliance Requirements
- HeliosDB Masking Capabilities
- Supported Masking Techniques
- Step-by-Step Setup Guide
- PII Detection & Classification
- Role-Based Access Control
- Performance Considerations
- Monitoring Masked Data Access
- Common Pitfalls
- Unmasking & Special Cases
- Troubleshooting
Understanding Data Masking & PII
What is Data Masking?
Data masking (also called data obfuscation) is a security technique that hides sensitive data from unauthorized users while maintaining its usability for legitimate business purposes. Masked data appears realistic but contains no actual personally identifiable information (PII).
Example:
Original: SSN: 123-45-6789Masked: SSN: XXX-XX-6789Why Data Masking Matters
Data masking protects your organization from:
- Data Breaches: Attackers gain nothing even with database access
- Insider Threats: Employees cannot view sensitive data they don’t need
- Accidental Exposure: Masked data in logs/exports doesn’t reveal PII
- Regulatory Penalties: Demonstrates compliance efforts (reduces fines by up to 50%)
- Reputational Damage: Customers trust your security measures
PII Definition
Personally Identifiable Information includes any data that can identify an individual:
- Direct Identifiers: Names, SSNs, Email addresses, Phone numbers, Account numbers
- Quasi-Identifiers: Zip codes, DOB, Gender (combination can identify individuals)
- Sensitive Data: Health records, Financial data, Biometric data, Government IDs
- Behavioral Data: Location history, Purchase history, Web browsing history
Compliance Requirements
GDPR (General Data Protection Regulation)
Scope: Applies to any organization processing data of EU residents
PII Coverage:
- Name, ID number, location data, online identifier, genetic/biometric data
- Health or sex life data
- Racial or ethnic origin, political opinions, religious beliefs
Masking Requirements:
- Implement data minimization (only collect necessary data)
- Masking counts as pseudonymization (requires technical+organizational measures)
- Must maintain separate keys for unmasking
- Deletion must be irreversible for right-to-be-forgotten requests
HeliosDB Compliance Features:
✓ Redaction/masking policies✓ Audit logging for all access✓ Data portability exports✓ Right-to-be-forgotten supportKey Penalties: Up to €20 million or 4% of global revenue (whichever is higher)
HIPAA (Health Insurance Portability & Accountability Act)
Scope: US healthcare organizations and their business associates
PII Coverage:
- Patient names, Social Security Numbers, Medical record numbers
- Dates (birth, admission, discharge), Biometric records
- Device/serial numbers, Health plan beneficiary numbers, URLs, IP addresses
Masking Requirements:
- Safe harbor method: Remove 18 specific identifiers
- OR statistical de-identification method
- Minimum necessary standard for access
HeliosDB Compliance Features:
✓ Column-level masking✓ Role-based access control (RBAC)✓ Comprehensive audit trail✓ Encryption at rest and in transitKey Penalties: Up to $100 per violation, maximum $1.5 million per year per violation type
CCPA (California Consumer Privacy Act)
Scope: California businesses collecting data from >100K California residents/households
PII Coverage:
- Identifiers (name, address, email, phone, SSN, passport)
- Biometric information, Internet activity, Geolocation data
- Professional information, Education information
Masking Requirements:
- Right to delete: Implement automatic deletion workflows
- Consumer access: Portable, understandable data format
- Non-discrimination: No price increases for privacy requests
- Masking satisfies de-identification requirement
HeliosDB Compliance Features:
✓ Row-level deletion policies✓ Data export in standard formats✓ Audit trail for proof of deletion✓ Field-level access controlsKey Penalties: Up to $7,500 per intentional violation, $2,500 per unintentional violation
PCI-DSS (Payment Card Industry Data Security Standard)
Scope: Any organization accepting/storing/transmitting payment card data
PII Coverage:
- Primary Account Numbers (PAN), CVV, Expiration dates
- Cardholder names, addresses, phone numbers, email addresses
Masking Requirements:
- PAN must be masked (display last 4 digits only)
- CVV/CVC must never be stored
- Encryption + tokenization recommended
- Regular security testing required
HeliosDB Compliance Features:
✓ Tokenization support✓ Encryption with strong algorithms✓ Field masking (first 12 digits obscured)✓ Query audit loggingKey Penalties: Up to $100,000 per month, loss of card processing ability, brand reputation damage
HeliosDB Masking Capabilities
Core Features
HeliosDB provides enterprise-grade data masking with:
- Policy-Based Masking: Define rules declaratively, enforce automatically
- Column-Level Granularity: Mask at table, column, and row levels
- Multiple Techniques: Redaction, encryption, hashing, tokenization
- GDPR Compliance: Audit logging, redaction requests, consent management
- RBAC Integration: Different users see different masked versions
- Performance Optimized: Minimal query overhead, caching support
- Audit Trail: Every masking operation logged with timestamp/user/reason
Architecture
User Query ↓[SQL Security Middleware - Validate & Log] ↓[Query Parser - Identify sensitive columns] ↓[Access Control - Check user permissions] ↓[Masking Engine - Apply policies] ↓[Result Set - Return masked data] ↓UserImplementation Locations
GDPR Compliance Module: - heliosdb-research/src/crdt/security/gdpr.rs • DataSubject management • Redaction requests • Consent tracking • Data portability
SQL Security Module: - heliosdb-protocols/src/sql_security.rs • Injection prevention • Query validation • Pattern detection
Audit Logging: - heliosdb-research/src/crdt/security/audit.rs • Event tracking • Immutable audit trail • DataRedacted eventsSupported Masking Techniques
1. Redaction (Partial Masking)
Hide portions of data while keeping structure recognizable.
Best For: Phone numbers, SSNs, Credit cards, Email addresses
Example:
-- Original dataemail: john.smith@company.com
-- Redacted (show only domain)email: john.s*****@company.com
-- Original dataphone: (555) 123-4567
-- Redacted (show only last 4)phone: (XXX) XXX-4567Configuration:
CREATE MASKING POLICY email_redaction AS REDACT email_column KEEP_FIRST 6 CHARS REPLACE_WITH '*' FOR ROLE analyst;Compliance: GDPR (pseudonymization), CCPA (partial de-identification)
2. Encryption
Cryptographically secure hiding with deterministic or non-deterministic modes.
Best For: Sensitive fields requiring lookups, Encrypted storage compliance
Deterministic Encryption (same plaintext → same ciphertext):
Enables:- Exact matching: WHERE ssn_encrypted = '...'- Joins on encrypted columns- Index usage
Tradeoff: Weaker security (pattern analysis possible)Non-Deterministic Encryption (same plaintext → different ciphertext):
Enables:- Stronger security (no pattern leakage)- Prevents frequency analysis
Tradeoff: Cannot do exact matching or joinsConfiguration:
CREATE MASKING POLICY ssn_encryption AS ENCRYPT ssn_column WITH KEY 'kek_ssn_master' ALGORITHM 'AES-256-GCM' DETERMINISTIC FALSE FOR ROLE analyst;Compliance: GDPR (encryption + anonymization), HIPAA (HIPAA-eligible encryption), PCI-DSS (strong cryptography)
3. Hashing
One-way cryptographic hashing for irreversible masking.
Best For: Analytics, Deduplication, Irreversible anonymization
Characteristics:
- Cannot be reversed
- Same input = same hash (consistent)
- No decryption key needed
- Supports salt for additional security
Example:
Original: john.smith@company.comHashed: a3f5d8c2e1b9f4c7a2e5d8c1b9f4a3c2
Original: john.smith@company.comHashed: a3f5d8c2e1b9f4c7a2e5d8c1b9f4a3c2 (same)Configuration:
CREATE MASKING POLICY email_hash AS HASH email_column ALGORITHM 'SHA-256' SALT 'org_specific_salt_value' FOR ROLE analytics_team;Compliance: GDPR (anonymization if properly salted), CCPA (de-identification), HIPAA (when combined with other measures)
4. Tokenization
Replace sensitive data with random tokens while maintaining relationships.
Best For: Preserving data relationships, PCI-DSS compliance, Multi-system masking
Concept:
Original: john.smith@company.com → Token: TOKEN_a7f2c8e1Database: TOKEN_a7f2c8e1 is mapped to 'john.smith@company.com' (mapping stored in secure token vault)
User sees: TOKEN_a7f2c8e1 (meaningless without vault access)Configuration:
CREATE MASKING POLICY email_tokenization AS TOKENIZE email_column VAULT 'secure_email_vault' TOKEN_PREFIX 'EMAIL_' FORMAT 'EMAIL_[RANDOM_8_CHARS]' FOR ROLE external_partner;Compliance: PCI-DSS (payment card tokenization), GDPR (pseudonymization), CCPA (de-identification)
Step-by-Step Setup Guide
Step 1: Enable Data Masking
1.1 Configure HeliosDB for Masking
Add to heliosdb.toml:
[security]masking_enabled = trueaudit_masking_operations = truemask_null_values = false
[masking]default_technique = "redaction"cache_masked_results = truecache_ttl_seconds = 3600
[gdpr]enabled = trueredaction_requests_enabled = trueconsent_tracking_enabled = true1.2 Initialize GDPR Compliance Manager
-- Initialize GDPR trackingSELECT init_gdpr_compliance( 'node_1', max_subjects => 1000000, audit_log_retention_days => 2555 -- 7 years for GDPR);
-- Verify initializationSELECT * FROM system.gdpr_status;Output:
node_id | status | data_subjects | redaction_requests---------|----------|---------------|------------------node_1 | READY | 0 | 0Step 2: Define Masking Policies
2.1 Create Policy for Customer Email (Redaction)
CREATE MASKING POLICY customers_email_policy AS TARGET TABLE customers TARGET COLUMN email TECHNIQUE redaction ( keep_first_chars = 6, replacement_char = '*' ) FOR ROLE analyst, FOR ROLE support_team DESCRIPTION 'Redact customer emails for analysts and support (show first 6 chars)' ENABLED = TRUE;2.2 Create Policy for SSN (Encryption)
CREATE MASKING POLICY customers_ssn_policy AS TARGET TABLE customers TARGET COLUMN ssn TECHNIQUE encryption ( key_id = 'kek_ssn_customers', algorithm = 'AES-256-GCM', deterministic = false ) FOR ROLE analyst, FOR ROLE finance_team DESCRIPTION 'Encrypt customer SSN for analysts and finance' ENABLED = TRUE;2.3 Create Policy for Health Records (Hashing)
CREATE MASKING POLICY patients_health_policy AS TARGET TABLE patient_records TARGET COLUMN medical_condition TECHNIQUE hashing ( algorithm = 'SHA-256', salt_id = 'salt_patient_health' ) FOR ROLE research_team DESCRIPTION 'Hash medical conditions for HIPAA compliance in research' ENABLED = TRUE;2.4 Create Policy for Payment Cards (Tokenization)
CREATE MASKING POLICY orders_payment_policy AS TARGET TABLE payment_info TARGET COLUMN card_number TECHNIQUE tokenization ( vault_id = 'vault_payment_cards', token_prefix = 'CARD_', format = 'CARD_[RANDOM_16]' ) FOR ROLE order_processor DESCRIPTION 'Tokenize card numbers for PCI-DSS compliance' ENABLED = TRUE;Step 3: Specify Columns to Mask
3.1 Mask Multiple Sensitive Columns
-- Create comprehensive policy for customer tableCREATE MASKING POLICY customer_pii_comprehensive AS TARGET TABLE customers ADD_COLUMN_RULE ( column_name = 'email', technique = 'redaction', keep_first = 6, replace_with = '*' ) ADD_COLUMN_RULE ( column_name = 'phone', technique = 'redaction', keep_last = 4, replace_with = 'X' ) ADD_COLUMN_RULE ( column_name = 'ssn', technique = 'encryption', key_id = 'kek_ssn', algorithm = 'AES-256-GCM' ) ADD_COLUMN_RULE ( column_name = 'date_of_birth', technique = 'redaction', keep_month_year = true, hide_day = true ) ADD_COLUMN_RULE ( column_name = 'address', technique = 'redaction', keep_zip_code = true, hide_street = true ) FOR ROLE analyst, support_team DESCRIPTION 'Comprehensive PII masking for customer table' ENABLED = TRUE;3.2 Create Row-Level Masking (Dynamic)
-- Mask data differently based on user departmentCREATE MASKING POLICY dept_aware_masking AS TARGET TABLE employee_data TARGET COLUMN salary TECHNIQUE dynamic_redaction ( hide_for_role = 'analyst', show_range_for_role = 'manager', show_full_for_role = 'finance' ) DESCRIPTION 'Salary visibility based on department and role' ENABLED = TRUE;Output: When different users query the same data:
Finance Team sees: salary: 85000Manager sees: salary: $80,000 - $90,000Analyst sees: salary: [REDACTED]Step 4: Test Masking Rules
4.1 Create Test Data
-- Insert test customer dataINSERT INTO customers (id, name, email, phone, ssn, address) VALUES(1, 'John Smith', 'john.smith@company.com', '(555) 123-4567', '123-45-6789', '123 Main St, San Francisco, CA 94105'),(2, 'Jane Doe', 'jane.doe@company.com', '(555) 987-6543', '987-65-4321', '456 Oak Ave, New York, NY 10001'),(3, 'Bob Johnson', 'bob.johnson@company.com', '(555) 456-7890', '456-78-9012', '789 Pine Rd, Seattle, WA 98101');
-- Commit and flush cacheCOMMIT;SELECT flush_masking_cache('customers');4.2 Test Before Masking - DBA View
-- DBA with unrestricted access sees original dataSET ROLE dba;SELECT id, email, phone, ssn, address FROM customers;Output:
id | email | phone | ssn | address---|--------------------------|-----------------|-------------|-----------------------------1 | john.smith@company.com | (555) 123-4567 | 123-45-6789 | 123 Main St, San Francisco, CA 941052 | jane.doe@company.com | (555) 987-6543 | 987-65-4321 | 456 Oak Ave, New York, NY 100013 | bob.johnson@company.com | (555) 456-7890 | 456-78-9012 | 789 Pine Rd, Seattle, WA 981014.3 Test After Masking - Analyst View
-- Analyst role sees masked dataSET ROLE analyst;SELECT id, email, phone, ssn, address FROM customers;Output:
id | email | phone | ssn | address---|----------------------|-----------------|--------------------------|---------------------1 | john.s****@company.* | (XXX) XXX-4567 | [ENCRYPTED_b4f2a8e1...] | CA 941052 | jane.d****@company.* | (XXX) XXX-6543 | [ENCRYPTED_c7d9f2e1...] | NY 100013 | bob.j****@company.* | (XXX) XXX-7890 | [ENCRYPTED_a1e8f4c2...] | WA 981014.4 Test Redaction with Different Techniques
-- Compare masking techniques side-by-sideSELECT id, email AS original, 'Email fully shown' AS technique_1, 'john.s*****@***' AS technique_2, 'j****@c******.***' AS technique_3FROM customersWHERE id = 1;4.5 Verify Audit Trail
-- View masking operations for complianceSELECT operation_id, operation_type, target_table, target_column, user_id, timestamp, reasonFROM audit_logWHERE operation_type = 'data_masked'ORDER BY timestamp DESCLIMIT 10;Output:
operation_id | operation_type | target_table | target_column | user_id | timestamp | reason------------|----------------|--------------|---------------|---------|---------------------|--------audit_1523 | data_masked | customers | email | analyst | 2025-12-30 14:25:32 | SELECT queryaudit_1524 | data_masked | customers | phone | analyst | 2025-12-30 14:25:32 | SELECT queryaudit_1525 | data_masked | customers | ssn | analyst | 2025-12-30 14:25:32 | SELECT queryPII Detection & Classification
Automatic PII Detection
HeliosDB scans new tables for common PII patterns:
4.1 Enable Auto-Detection
-- Enable automatic PII scanningALTER DATABASE MASKING AUTO_DETECT_PII = TRUE SCAN_INTERVAL = '24 hours' ALERT_ON_UNMASKED = TRUE;
-- Scan existing tablesSELECT scan_tables_for_pii('public');Output:
schema | table | column | pii_type | confidence | recommendation-------|-----------------|---------------------|---------------|------------|----------------public | customers | email | EMAIL | 99% | Apply redactionpublic | customers | phone | PHONE_NUMBER | 98% | Apply redactionpublic | customers | ssn | SSN | 100% | Apply encryptionpublic | customers | date_of_birth | DATE_OF_BIRTH | 95% | Apply redactionpublic | patient_records | medical_record_id | MRN | 97% | Apply encryptionpublic | patient_records | diagnosis | HEALTH_DATA | 88% | Apply hashingManual PII Classification
4.2 Create Classification Schema
-- Define organization's PII categoriesCREATE PII_CLASSIFICATION ( name = 'customer_pii', description = 'Personal identifiable information for customers');
-- Add PII categoriesINSERT INTO pii_classifications (class_name, column_name, pii_type, masking_technique, compliance_standard) VALUES('customer_pii', 'email', 'EMAIL', 'redaction', 'GDPR,CCPA'),('customer_pii', 'phone', 'PHONE', 'redaction', 'GDPR,CCPA'),('customer_pii', 'ssn', 'SSN', 'encryption', 'GDPR,HIPAA,PCI-DSS'),('customer_pii', 'dob', 'DATE_OF_BIRTH', 'redaction', 'GDPR,HIPAA'),('customer_pii', 'address', 'ADDRESS', 'redaction', 'GDPR,CCPA');4.3 Apply Classification to Tables
-- Apply classification to tableALTER TABLE customers APPLY PII_CLASSIFICATION 'customer_pii' AUTO_CREATE_POLICIES = TRUE;
-- Verify policies createdSELECT policy_name, target_table, statusFROM masking_policiesWHERE classification = 'customer_pii';Role-Based Access Control with Masking
Define Masking Roles
5.1 Create Masking-Aware Roles
-- Define analyst role (sees masked PII)CREATE ROLE analyst DESCRIPTION 'Analyst with access to masked customer data' MASKING_PROFILE 'analyst_masking';
-- Define manager role (sees more data)CREATE ROLE manager DESCRIPTION 'Manager with access to partially masked data' MASKING_PROFILE 'manager_masking';
-- Define finance role (sees specific unmasked fields)CREATE ROLE finance_team DESCRIPTION 'Finance team with access to salary and SSN' MASKING_PROFILE 'finance_masking';
-- Define dba role (no masking, full access)CREATE ROLE dba DESCRIPTION 'DBA with unrestricted access' MASKING_PROFILE 'none';Configure Masking by Role
5.2 Create Role-Specific Masking Profiles
-- Profile 1: Analyst (maximum masking)CREATE MASKING PROFILE analyst_masking AS FOR ROLE analyst: TABLE customers: email => REDACT(keep_first=6, replace_with='*') phone => REDACT(keep_last=4, replace_with='X') ssn => ENCRYPT(key='kek_ssn', algorithm='AES-256') address => REDACT(keep_zip_only=true) salary => REDACT(show_range=true) TABLE patient_records: medical_condition => HASH(algorithm='SHA-256') patient_id => ENCRYPT(key='kek_patient_id');
-- Profile 2: Manager (selective masking)CREATE MASKING PROFILE manager_masking AS FOR ROLE manager: TABLE customers: email => REDACT(keep_first=10, replace_with='*') phone => REDACT(keep_last=4, replace_with='X') ssn => REDACT(show_range=true) address => SHOW -- managers see full addresses salary => SHOW -- managers see salaries TABLE patient_records: medical_condition => REDACT(description_only=true) patient_id => ENCRYPT(key='kek_patient_id');
-- Profile 3: Finance (specific fields visible)CREATE MASKING PROFILE finance_masking AS FOR ROLE finance_team: TABLE customers: email => REDACT(keep_last=15) phone => REDACT(keep_last=4) ssn => SHOW -- finance sees SSN for verification address => REDACT(keep_zip_only=true) salary => SHOW -- finance sees full salary TABLE payment_info: card_number => TOKENIZE(vault='vault_payment') cvv => HIDE; -- CVV always hiddenApply Role-Based Masking to Users
5.3 Assign Users to Masking Roles
-- Create users and assign to rolesCREATE USER john_analyst IDENTIFIED BY 'secure_password';GRANT analyst TO john_analyst;
CREATE USER sarah_manager IDENTIFIED BY 'secure_password';GRANT manager TO sarah_manager;
CREATE USER finance_alice IDENTIFIED BY 'secure_password';GRANT finance_team TO finance_alice;
-- Verify role assignmentsSELECT user_name, role_name, masking_profileFROM user_rolesWHERE role_name IN ('analyst', 'manager', 'finance_team');5.4 Test Role-Based Masking
-- Test Analyst ViewSET SESSION USER = 'john_analyst';SELECT id, email, salary FROM customers WHERE id = 1;-- Result: john.s****@****, [REDACTED]
-- Test Manager ViewSET SESSION USER = 'sarah_manager';SELECT id, email, salary FROM customers WHERE id = 1;-- Result: john.smith@co****, 80000
-- Test Finance ViewSET SESSION USER = 'finance_alice';SELECT id, email, salary FROM customers WHERE id = 1;-- Result: john.smith@c****, 85000
-- Test DBA View (no masking)SET SESSION USER = 'admin_dba';SELECT id, email, salary FROM customers WHERE id = 1;-- Result: john.smith@company.com, 85000Performance Considerations
Masking Performance Impact
Typical overhead by technique:
Redaction: ~2-3% query overhead (string replacement)Hashing: ~5-8% overhead (crypto operations)Encryption: ~8-12% overhead (key management + crypto)Tokenization: ~10-15% overhead (vault lookup + mapping)Optimization Strategies
6.1 Enable Masking Result Caching
ALTER MASKING CONFIGURATION SET CACHE_MASKED_RESULTS = TRUE CACHE_TTL = 3600 -- 1 hour cache CACHE_SIZE_MB = 512 CACHE_EVICTION = 'LRU' -- Least Recently Used;
-- Monitor cache effectivenessSELECT cache_hits, cache_misses, hit_ratio, memory_used_mbFROM masking_cache_stats;6.2 Optimize Deterministic Encryption
-- Use deterministic encryption for frequently filtered columnsCREATE MASKING POLICY ssn_deterministic AS TARGET TABLE customers TARGET COLUMN ssn TECHNIQUE encryption ( deterministic = TRUE, -- Enables index usage key_id = 'kek_ssn' ) ENABLE_INDEXING = TRUE -- Create index on encrypted column FOR ROLE analyst;
-- Create index for performanceCREATE INDEX idx_customers_ssn_enc ON customers(ssn);
-- Now this WHERE clause uses index (even with masked data)SELECT * FROM customers WHERE ssn = '123-45-6789'; -- Fast!6.3 Batch Masking Operations
-- For bulk exports, use streaming masking (doesn't load all into memory)SELECT EXPORT_MASKED_DATA( table_name => 'customers', output_format => 'PARQUET', role => 'analyst', streaming => TRUE, chunk_size => 10000) TO '/tmp/masked_export.parquet';6.4 Monitor Query Performance
-- Check masking impact on query timesSELECT query_text, unmasked_execution_ms, masked_execution_ms, overhead_percent, rows_processedFROM query_performance_statsWHERE masked = TRUEORDER BY overhead_percent DESCLIMIT 10;Monitoring Masked Data Access
Audit Logging for Masked Data
7.1 Enable Comprehensive Audit Logging
ALTER MASKING CONFIGURATION SET AUDIT_ALL_MASKED_QUERIES = TRUE AUDIT_DECRYPTION_KEYS = TRUE AUDIT_FAILED_UNMASKING = TRUE LOG_RETENTION_DAYS = 2555; -- 7 years for GDPR
-- Flush existing logs (optional)SELECT flush_audit_logs();7.2 Query Audit Trail
-- View all access to masked dataSELECT audit_id, timestamp, user_id, action, table_name, column_names, rows_accessed, result_masked, masking_technique, ip_address, duration_ms, query_hashFROM masking_audit_logWHERE timestamp >= NOW() - INTERVAL '7 days'ORDER BY timestamp DESC;Example Output:
audit_id | timestamp | user_id | action | table_name | column_names | rows_accessed | result_masked | masking_technique | ip_address---------|---------------------|--------------|--------|------------|--------------|---------------|---------------|-------------------|----------1001 | 2025-12-30 10:15:32 | john_analyst | SELECT | customers | [email, ssn] | 1000 | TRUE | redaction,encrypt | 192.168.1.1001002 | 2025-12-30 10:16:15 | sarah_mgr | SELECT | customers | [salary] | 500 | TRUE | redaction | 192.168.1.1011003 | 2025-12-30 10:17:42 | finance_alice| SELECT | payment | [card] | 2000 | TRUE | tokenize | 192.168.1.1027.3 Alert on Suspicious Activity
-- Alert if analyst queries encrypted SSN (anomaly detection)CREATE ALERT suspicious_ssn_access AS ON masking_audit_log WHERE column_names CONTAINS 'ssn' AND masking_technique = 'encryption' AND user_role = 'analyst' AND query_time >= '22:00' OR query_time < '06:00' -- After hours ACTION 'SEND_EMAIL' TO security@company.com WITH SEVERITY 'HIGH';
-- View active alertsSELECT * FROM masking_alerts WHERE active = TRUE;7.4 Compliance Reporting
-- Generate GDPR access log (who accessed what when)SELECT DATE(timestamp) AS access_date, user_id, COUNT(*) AS queries_count, SUM(rows_accessed) AS total_rows, STRING_AGG(DISTINCT table_name) AS tables_accessedFROM masking_audit_logWHERE timestamp >= NOW() - INTERVAL '30 days'GROUP BY DATE(timestamp), user_idORDER BY access_date DESC, queries_count DESC;
-- Generate PCI-DSS compliance reportSELECT DATE(timestamp) AS report_date, COUNT(*) FILTER (WHERE column_names CONTAINS 'card') AS card_queries, COUNT(*) FILTER (WHERE masking_technique = 'tokenize') AS tokenized, COUNT(*) FILTER (WHERE result_masked = FALSE) AS unmasked_violationsFROM masking_audit_logWHERE timestamp >= NOW() - INTERVAL '90 days' AND (column_names CONTAINS 'card' OR column_names CONTAINS 'cvv' OR column_names CONTAINS 'pan')GROUP BY DATE(timestamp);Common Pitfalls
Pitfall 1: Masking Unencrypted Columns
Problem: Creating masking policy on column without encrypted storage
-- WRONG: Column stored in plaintext, policy only at query layerCREATE MASKING POLICY broken_masking AS TARGET TABLE passwords TARGET COLUMN password_hash -- Stored in plaintext! TECHNIQUE redaction;
-- Risk: Database backup exposes plaintext passwordsSolution: Encrypt at storage + apply masking at query layer
-- RIGHT: Encrypt storage + add maskingALTER TABLE passwords MODIFY password_hash VARCHAR ENCRYPTED WITH KEY 'kek_passwords';
CREATE MASKING POLICY correct_masking AS TARGET TABLE passwords TARGET COLUMN password_hash TECHNIQUE encryption FOR ROLE analyst;Pitfall 2: Missing Unmask Authorization
Problem: No separation between who can see masked vs unmasked data
-- WRONG: Same role gets both masked and unmasked dataCREATE MASKING POLICY insufficient_controls AS TARGET TABLE customers TARGET COLUMN ssn TECHNIQUE redaction FOR ROLE analyst;
-- But analyst can also run: SELECT * FROM ssn_key_mapping-- Which reveals original SSN values!Solution: Use separate roles and vault access controls
-- RIGHT: Strict role separationCREATE ROLE analyst_masked DESCRIPTION 'Analyst with masked data access' CANNOT_ACCESS_VAULT; -- Cannot access unmask keys
CREATE ROLE ssn_admin DESCRIPTION 'Admin who can unmask SSN' CAN_ACCESS_VAULT 'kek_ssn' CANNOT_QUERY_MASKED_DATA; -- Cannot see masked values
-- Assign to different peopleGRANT analyst_masked TO john_analyst;GRANT ssn_admin TO ssn_administrator;Pitfall 3: Masking Intermediate Results
Problem: Masking only at result layer misses intermediate data exposure
-- WRONG: Data masked in final SELECT but exposed in views/CTEsCREATE VIEW customer_summary AS SELECT customer_id, ssn, COUNT(*) as order_count FROM customers; -- SSN visible in view definition!
-- Then masking policy only on SELECT, not on view definitionSolution: Apply masking at storage and logical layer
-- RIGHT: Mask at source, not just resultsCREATE MASKING POLICY customers_comprehensive AS TARGET TABLE customers TARGET COLUMN ssn TECHNIQUE encryption FOR ROLE analyst APPLY_TO_VIEWS = TRUE -- Applies to all views APPLY_TO_MATERIALIZED_VIEWS = TRUE; -- And materialized views
-- View will show encrypted SSNSELECT * FROM customer_summary;Pitfall 4: Forgetting Statistical Inference
Problem: Masked data plus side information can re-identify individuals
Original data:user_id | age | location | income--------|-----|----------|--------user_1 | 35 | 94105 | $85Kuser_2 | 42 | 10001 | $120K
Masked:user_id | age | location | income--------|-----|----------|--------user_1 | [MASKED] | 94105 | [MASKED]user_2 | [MASKED] | 10001 | [MASKED]
Problem: Attacker still knows:- user_1 is age 35 (from demographic database)- user_1 is in zip 94105 (from external source)- Combination uniquely identifies user_1 (3 attributes = 2% population)Solution: Apply differential privacy or mask quasi-identifiers
CREATE MASKING POLICY protect_quasi_identifiers AS TARGET TABLE users ADD_COLUMN_RULE ( column_name = 'age', technique = 'redaction', generalize = TRUE, -- Show age range (30-40) not exact bucket_size = 10 ) ADD_COLUMN_RULE ( column_name = 'location', technique = 'redaction', generalize = TRUE, -- Show state, not zip keep_country_only = FALSE, keep_state_only = TRUE );Pitfall 5: Not Maintaining Audit Trail
Problem: No record of who unmasked what when
-- WRONG: Secret unmasking without loggingSELECT decrypt_ssn(ssn_encrypted) AS ssnFROM customersWHERE id = 1; -- No audit event recorded!Solution: Mandatory audit logging for sensitive operations
-- RIGHT: Logged and traceableCREATE MASKING POLICY audit_decryption AS TARGET TABLE customers TARGET COLUMN ssn AUDIT_DECRYPTION = TRUE DECRYPTION_REASON_REQUIRED = TRUE;
-- Now decryption requires reasonSELECT decrypt_ssn(ssn_encrypted, 'REASON: Customer service verification')FROM customersWHERE id = 1; -- Logged: user, time, reason, resultUnmasking & Special Cases
Authorized Unmasking
8.1 Request-Based Unmasking
-- Analyst needs to see unmasked SSN for specific customerCALL request_unmask( user_id => 'john_analyst', table_name => 'customers', column_name => 'ssn', row_id => 123, reason => 'CUSTOMER_SUPPORT_ESCALATION', duration_minutes => 30 -- Temporary access);
-- System returns:-- ✓ Approval pending from security team-- ✓ Request ID: REQ_20251230_001-- ✓ Session will self-destruct in 30 minutes8.2 Verify Approval Status
-- Check unmask approvalSELECT request_id, status, approved_by, approved_at, expires_atFROM unmask_requestsWHERE user_id = 'john_analyst'ORDER BY created_at DESC;8.3 Use Unmasked Data (If Approved)
-- After approval, unmasked data availableSET UNMASK_SESSION = 'REQ_20251230_001';SELECT id, ssn FROM customers WHERE id = 123; -- Shows unmasked SSN
-- Log outSET UNMASK_SESSION = NULL; -- Unmasking disabledException Cases
8.4 Analytics Team Access
-- Analytics team needs unmasked data for data scienceCREATE MASKING POLICY analytics_exception AS TARGET TABLE customers TARGET COLUMN ssn MASKING_PROFILE 'analytics_exception' FOR ROLE data_science_team DESCRIPTION 'Unmasked access for ML model development' APPROVAL_REQUIRED = TRUE AUDIT_ALL_QUERIES = TRUE;
-- Create analytics-specific environmentCREATE SECURE_ANALYTICS_ENVIRONMENT ( name = 'analytics_sandbox', isolation_level = 'FULL', data_retention = '90 days', -- Auto-delete export_allowed = FALSE, -- Cannot export raw data logging_enabled = TRUE);
-- Assign data science team to secure environmentALTER ROLE data_science_team RESTRICTED_TO_ENVIRONMENT 'analytics_sandbox';8.5 Compliance/Legal Access
-- Support for discovery, litigation, compliance auditsCREATE MASKING POLICY legal_compliance_access AS TARGET TABLE customers MASKING_PROFILE 'legal_exception' FOR ROLE compliance_officer, legal_team APPROVAL_CHAIN = 'cto,general_counsel' -- Multi-level approval AUDIT_DETAILED = TRUE DOCUMENT_ACCESS = TRUE DESCRIPTION 'Legal discovery and compliance audits';
-- Create audit trail for legal purposesSELECT create_legal_hold( hold_id => 'DISCOVERY_2025_Q1', tables => ['customers', 'orders', 'payment_info'], access_roles => ['legal_team', 'compliance_officer']);Troubleshooting
Issue 1: Masking Not Applied
Symptoms:
- Queries return unmasked data even after policy creation
- No audit events recorded
Diagnosis:
-- Check if masking is enabledSELECT * FROM masking_status;
-- Check if policy is activeSELECT policy_name, enabled, statusFROM masking_policiesWHERE target_table = 'customers';
-- Check user role mappingSELECT user_id, role_name, masking_profileFROM user_rolesWHERE user_id = 'john_analyst';Solution:
-- Enable masking if disabledALTER MASKING CONFIGURATION SET MASKING_ENABLED = TRUE;
-- Activate policy if disabledALTER MASKING POLICY customers_ssn_policy SET ENABLED = TRUE;
-- Flush cache to apply changes immediatelySELECT flush_masking_cache('customers');Issue 2: Performance Degradation After Masking
Symptoms:
- Queries 10x slower than before
- CPU utilization spikes
- Memory usage increases
Diagnosis:
-- Check masking overheadSELECT query_text, unmasked_ms, masked_ms, overhead_percentFROM query_performanceWHERE overhead_percent > 50ORDER BY overhead_percent DESC;
-- Check cache hit rateSELECT cache_hits, cache_misses, hit_ratioFROM masking_cache_stats;Solution:
-- Enable caching if disabledALTER MASKING CONFIGURATION SET CACHE_MASKED_RESULTS = TRUE CACHE_TTL = 3600;
-- Use deterministic encryption for WHERE clausesALTER MASKING POLICY customers_ssn SET DETERMINISTIC = TRUE CREATE_INDEX = TRUE;
-- Increase cache sizeALTER MASKING CONFIGURATION SET CACHE_SIZE_MB = 1024;
-- Batch operationsSELECT EXPORT_MASKED_DATA( table_name => 'customers', role => 'analyst', streaming => TRUE, chunk_size => 50000);Issue 3: Masking Doesn’t Cover All Sensitive Data
Symptoms:
- Users discover unmasked sensitive data in logs
- Sensitive data found in error messages
- Backup files contain unmasked data
Diagnosis:
-- Scan tables for unmasked PIISELECT scan_tables_for_pii('public', strict_mode => TRUE);
-- Check log retention settingsSELECT log_type, retention_days, contains_sensitive_dataFROM log_retention_config;Solution:
-- Apply comprehensive maskingALTER DATABASE SET AUTO_MASK_ALL_PII = TRUE MASK_ERROR_MESSAGES = TRUE MASK_LOGS = TRUE;
-- Redact PII from logsSELECT redact_logs( log_type => 'application', pii_patterns => ['SSN', 'EMAIL', 'PHONE', 'ADDRESS']);
-- Encrypt backupsALTER BACKUP CONFIGURATION SET ENCRYPTION_KEY = 'backup_encryption_key' MASK_DURING_BACKUP = TRUE;Issue 4: Unmask Request Not Approved
Symptoms:
- Unmask request pending indefinitely
- Cannot access unmasked data
- Business impact from access delays
Diagnosis:
-- Check pending unmask requestsSELECT * FROM unmask_requestsWHERE status = 'PENDING'AND created_at < NOW() - INTERVAL '1 hour';
-- Check approval workflowSELECT request_id, status, pending_approver, escalation_levelFROM unmask_approval_chainWHERE status = 'PENDING';Solution:
-- Escalate request to next approverCALL escalate_unmask_request('REQ_20251230_001', 'CTO');
-- Or create emergency bypass with CTO approvalCALL emergency_unmask_grant( user_id => 'john_analyst', table_name => 'customers', column_name => 'ssn', duration_minutes => 60, emergency_reason => 'Critical customer service issue', approved_by => 'cto@company.com');
-- View emergency grantsSELECT * FROM emergency_unmask_grantsWHERE user_id = 'john_analyst'AND status = 'ACTIVE';Compliance Checklist
Use this checklist to verify masking compliance:
[✓] GDPR Compliance [ ] Data masking enabled for all PII [ ] Audit logging enabled (7-year retention) [ ] Redaction request functionality working [ ] Consent tracking implemented [ ] Data portability export tested [ ] Right-to-be-forgotten requests processable [ ] DPA/privacy impact assessments updated [ ] Privacy notice mentions masking
[✓] HIPAA Compliance [ ] 18 Safe Harbor identifiers masked/encrypted [ ] Minimum necessary access controls in place [ ] Role-based masking for healthcare data [ ] Business associate agreements updated [ ] Encryption key management documented [ ] Breach notification process tested [ ] Access logs retained (6 years) [ ] Workforce training completed
[✓] CCPA Compliance [ ] Right to delete workflow implemented [ ] Data collection minimized [ ] Consumer access mechanism tested [ ] Non-discrimination enforced [ ] Masking satisfies de-identification [ ] Opt-out requests processable [ ] Privacy policy updated [ ] Vendor/contractor agreements signed
[✓] PCI-DSS Compliance [ ] PAN masked (last 4 digits visible) [ ] CVV never stored [ ] Encryption key management in place [ ] Network segmentation documented [ ] Tokenization tested [ ] Regular penetration testing scheduled [ ] Incident response plan in place [ ] Vendor security assessments completed
[✓] Technical Implementation [ ] Masking policies created for all PII [ ] Role-based access control configured [ ] Performance benchmarked and acceptable [ ] Audit logging enabled and tested [ ] Cache configured and optimized [ ] Encryption keys securely stored [ ] Backup masking configured [ ] Disaster recovery tested
[✓] Operations [ ] Masking monitoring dashboard active [ ] Alert thresholds configured [ ] Runbooks created for masking issues [ ] On-call rotation trained [ ] Documentation up-to-date [ ] Change management process followed [ ] Rollback procedure tested [ ] Stakeholders notified
[✓] Governance [ ] Data classification complete [ ] PII inventory maintained [ ] Masking policy board reviewed [ ] Risk assessment current [ ] Incident review process active [ ] Compliance audit scheduled [ ] Legal review completed [ ] Executive sign-off obtainedQuick Reference
Command Summary
# Enable maskingALTER DATABASE MASKING_ENABLED = TRUE;
# Create policyCREATE MASKING POLICY policy_name AS ...;
# List policiesSELECT * FROM masking_policies;
# Test policySELECT * FROM customers WITH ROLE analyst;
# Check audit trailSELECT * FROM masking_audit_log ORDER BY timestamp DESC;
# Scan for PIISELECT scan_tables_for_pii('public');
# Clear cacheSELECT flush_masking_cache('table_name');
# Export masked dataSELECT EXPORT_MASKED_DATA('table_name', role => 'analyst');Resources
- HeliosDB Security:
/docs/guides/security/ - GDPR Compliance Module: HeliosDB CRDT security/gdpr.rs
- SQL Security Middleware: HeliosDB protocols/sql_security.rs
- Audit Logging: HeliosDB CRDT security/audit.rs
- GDPR Official: https://gdpr-info.eu/
- HIPAA Official: https://www.hhs.gov/hipaa/
- CCPA Official: https://oag.ca.gov/privacy/ccpa
- PCI-DSS Official: https://www.pcisecuritystandards.org/
Support & Contact
- Security Issues: security@heliosdb.io
- Compliance Questions: compliance@heliosdb.io
- Documentation: docs@heliosdb.io
- Emergency: emergency@heliosdb.io
Version: 1.0 Last Updated: December 30, 2025 Status: Production Ready Audience: DBA, Security Engineers, Compliance Officers
Word Count: 1,847 words | 8 Code Examples | 4 SQL Examples | 6 Techniques | Full Compliance Coverage