Skip to content

Business Continuity Plan

Business Continuity Plan

Overview

This Business Continuity Plan (BCP) ensures HeliosDB Nano operations can continue during and after disruptive events, protecting business functions, stakeholders, and reputation.

Scope

This plan covers:

  • Development and engineering operations
  • Customer support services
  • Infrastructure and operations
  • Corporate functions

Business Impact Analysis

Critical Business Functions

FunctionRTORPOImpact of Disruption
Production database service5 min1 minCustomer data unavailable
Customer support4 hoursN/ASupport tickets delayed
Development24 hoursN/ARelease schedule impacted
Sales/Marketing48 hoursN/ARevenue pipeline impacted

Dependency Matrix

┌─────────────────────────────────────────────────────────────────┐
│ Critical Dependencies │
├─────────────────────────────────────────────────────────────────┤
│ Database Service │
│ ├── Cloud Infrastructure (AWS/GCP) │
│ ├── DNS Services │
│ ├── Certificate Authority │
│ └── Monitoring Systems │
│ │
│ Development │
│ ├── GitHub │
│ ├── CI/CD Pipeline │
│ └── Development Environments │
│ │
│ Support │
│ ├── Ticketing System │
│ ├── Communication Tools │
│ └── Documentation │
└─────────────────────────────────────────────────────────────────┘

Continuity Strategies

Strategy 1: Geographic Redundancy

  • Primary: US-East region
  • Secondary: US-West region
  • Tertiary: EU region (for EU customers)

Strategy 2: Remote Work Capability

All team members equipped for full remote work:

  • Laptop with development environment
  • VPN access to all systems
  • Communication tools (Slack, Zoom)
  • Documentation access

Strategy 3: Supplier Diversification

ServicePrimaryBackup
Cloud hostingAWSGCP
DNSRoute53Cloudflare
EmailGoogle WorkspaceBackup SMTP
CommunicationSlackDiscord

Activation Procedures

Activation Criteria

EventActivation LevelAuthority
Single component failureNoneAutomated
Service degradationLevel 1Operations
Partial outageLevel 2VP Engineering
Full outageLevel 3Executive team
Regional disasterLevel 4CEO

Activation Process

Event Detected
Assess Impact ──▶ Minor? ──▶ Normal Incident Response
▼ Major
Activate BCP Team
Determine Level
Execute Procedures
Monitor & Adjust
Recovery & Lessons Learned

Response Procedures

Level 1: Service Degradation

Duration: Up to 4 hours

  1. Activate on-call team
  2. Implement workarounds
  3. Communicate with affected customers
  4. Restore normal operations
  5. Document incident

Level 2: Partial Outage

Duration: 4-24 hours

  1. Activate BCP team
  2. Failover to redundant systems
  3. Customer communication (status page)
  4. Coordinate with affected teams
  5. Regular status updates
  6. Recovery planning

Level 3: Full Outage

Duration: 24+ hours

  1. Executive notification
  2. Full DR activation
  3. Customer communication (direct)
  4. Media/PR coordination
  5. Extended team mobilization
  6. Daily status calls

Level 4: Regional Disaster

Duration: Extended

  1. All-hands notification
  2. Employee safety verification
  3. Alternate site activation
  4. Business function prioritization
  5. Extended operation mode
  6. Recovery planning

Communication Plan

Internal Communication

AudienceChannelFrequencyOwner
BCP TeamSlack #incidentReal-timeIC
EngineeringEmail + SlackHourlyVP Eng
All StaffEmailDailyHR
ExecutivesPhone/SlackAs neededCEO

External Communication

AudienceChannelFrequencyOwner
Affected customersEmailImmediateSupport
All customersStatus pageReal-timeOps
PartnersEmailDailyBD
MediaPress releaseAs neededPR

Communication Templates

Customer Notification:

Subject: [Status Update] HeliosDB Service
Current Status: [Investigating/Identified/Resolved]
We are currently experiencing [brief description].
Impact: [What customers may experience]
Actions: [What we are doing]
ETA: [Expected resolution time]
Updates: status.heliosdb.io
We apologize for any inconvenience.

Team Responsibilities

BCP Team Structure

RoleResponsibilitiesPrimaryBackup
Incident CommanderOverall coordinationVP OpsDirector Eng
Technical LeadTechnical decisionsCTOSr. Engineer
CommunicationsInternal/external commsVP MarketingPR Manager
Customer SuccessCustomer communicationVP CSCS Manager
HR/SafetyEmployee welfareHR DirectorHR Manager

Contact Information

Maintained in secure, offline document available to all BCP team members.

Recovery Procedures

Service Recovery

  1. Assessment: Evaluate damage and requirements
  2. Prioritization: Critical functions first
  3. Restoration: Systematic service restoration
  4. Verification: Testing and validation
  5. Return to Normal: Full operations resume

Data Recovery

See: DISASTER_RECOVERY.md

Facility Recovery

  1. Assess facility status
  2. Activate alternate site if needed
  3. Coordinate equipment/supplies
  4. Resume operations
  5. Plan permanent recovery

Testing & Maintenance

Testing Schedule

Test TypeFrequencyParticipantsDuration
Tabletop exerciseQuarterlyBCP team2 hours
Communication testMonthlyAll staff30 min
Technical DR drillMonthlyEngineering4 hours
Full simulationAnnuallyAll teams1 day

Plan Maintenance

ActivityFrequencyOwner
Contact list updateMonthlyHR
Procedure reviewQuarterlyOperations
Full plan reviewAnnuallyExecutive team
Post-incident updateAfter each incidentIC

Training

  • Annual BCP awareness training for all staff
  • Quarterly deep-dive for BCP team
  • New hire orientation includes BCP overview

Appendices

Appendix A: Emergency Contacts

[Maintained separately in secure document]

Appendix B: Vendor Contacts

VendorServiceSupport ContactAccount ID
AWSInfrastructureaws.amazon.com/support[ID]
CloudflareCDN/DNScloudflare.com/support[ID]
GitHubSource controlgithub.com/support[ID]
PagerDutyAlertingpagerduty.com/support[ID]

Appendix C: Checklist

Initial Response:

  • Incident confirmed
  • BCP team notified
  • Impact assessed
  • Level determined
  • Procedures initiated

During Incident:

  • Regular status updates
  • Customer communication
  • Resource coordination
  • Documentation maintained

Recovery:

  • Services restored
  • Verification complete
  • Stakeholders notified
  • Normal operations resumed

Post-Incident:

  • Lessons learned meeting
  • Plan updates identified
  • Documentation updated
  • Training needs assessed