Disaster Recovery
Overview
Purpose
The purpose of this report is to provide an overview of the disaster recovery exercise conducted on:
5th September 2024
The exercise aimed to test the effectiveness and readiness of our disaster recovery plan, ensuring that the Skpr hosting platform can recover from a simulated disaster scenario.
Scope
This exercise is focused on the infrastructure that supports customer applications.
While this exercise is typically executed as part of a larger customer engagement, this document only outlines the infrastructure steps taken.
Objectives
- Validate the effectiveness of the disaster recovery plan.
- Validate the effectivnesss of the infrastructure provisoning tooling.
- Identify any gaps or bugs in the recovery procedures and tooling.
Steps Taken
Mock Scenario
The Skpr platform team determines the security of the platform has been compromised. It was deemed that a new cluster needed to be provisioned for security purposes.
Steps Taken by Team
- A new AWS account is setup
- A Skpr platform instance is provisioned on the new AWS account using the Platform Teams Terraform manifests
- All Skpr platform components are tested to ensure they are ready for developer interaction
- Customer sites are then onboarded (deployed and all assets synced) to the new platform for testing
Platform Components Validated
The following API's and assocaited workflows were validated:
- Release
- Environment
- Config
- Backup
- Restore
- Exec/Shell