Disaster Recovery Exercise Engagement
Overview
This document outlines the collaborative activities carried out by the Skpr platform team and our customers as part of a simulated disaster recovery exercise.
During this exercise, the Skpr hosting platform is fully reprovisioned within a dedicated test account, and the customer’s application(s) are restored and redeployed using the latest available backups.
The exercise also focuses on identifying any required steps, configuration changes, or gaps that the application development team must address to ensure the environment is production-ready.
Out of Scope
Executing a production cutover is out of scope for this exercise.
All activities are performed within a separate AWS account/environment to ensure normal operations remain uninterrupted and to avoid any risk of downtime.
Objectives
- Validate the effectiveness of both platform and customer disaster recovery plans.
- Validate the effectiveness of the infrastructure provisoning tooling.
- Identify any gaps or bugs in the application recovery documentation, procedures and tooling.
Reasons for a Recovery Exercise Engagement
- To meet external audit, compliance, or regulatory obligations related to customer business continuity and disaster recovery.
- To give both parties practical, real-world experience that enhances preparedness for an actual disaster event.
- To validate communication and coordination processes between Skpr and the customer during recovery events.
Stages of the Engagement
Step 1. Provision a new Skpr cluster
- Cluster Deployment: A new AWS account is provisioned by the Skpr Platform Team, followed by the deployment of a Skpr cluster using Terraform Infrastructure-as-Code (IaC) manifests.
Step 2. Redeploy Project(s)
- Project Deployment: Customer application(s) are packaged and deployed onto the newly created Skpr cluster.
- Content Import: The latest available backups for those application(s) are imported to restore content and data.
- Team Collaboration: The Skpr Platform Team manages the overall redeployment process. Collaboration with the Customer Application Development Team is essential to ensure that any issues or bugs identified during deployment are promptly investigated and resolved.
Step 3. Review and Document
- Engagement Document Created: A shared document is created using the Document Template below. All teams are required to update this document.
- Project Review & Configuration Documentation: The Customer Application Development Team, along with Customer Product Teams and Stakeholders, review the redeployed application(s) to validate functionality and identify any configuration updates required for production readiness eg. Google Analytics or other tracking configurations, API key updates or rotations, new IP addresses added to external backend services or allow lists. Any configuration changes are document
- Platform Documentation & Support: The Skpr Platform Team updates both internal platform documentation and as needed and collaborates with the Customer Application Development and Product Teams to provide guidance or support throughout the review process.
- Production Ready: This step is considered complete once all identified defects have been resolved or formally documented as part of the disaster recovery procedure—for example, configuration updates to external services that would otherwise result in downtime.
Step 4. Team Retro
A meeting is held to discuss the document, any further updates that are required and changes that should be implemented for the next engagement.
Document Template
Below is a template which can be used for documenting the Disaster Recovery Exercise Engagement.
# Disaster Recovery Exercise – Engagement Document Template
**Purpose:**
This template is used by the Skpr Platform Team, Customer Application Development Team, and Customer Product/Stakeholder Teams to document findings, required changes, defects, and follow-up actions during a Disaster Recovery Exercise.
Update this document to reflect the actual purpose of the exercise.
## Engagement Details
* **Customer / Organisation** -
* **Project(s) Included** -
* **Date of Disaster Recovery Exercise** -
* **Skpr Team Participants** -
* **Customer Participants** -
* **AWS Test Account ID** -
## Timeline
An overview of how long each step took to complete.
| Step | Time Taken |
|--------|------------|
| Step 1 | |
| Step 2 | |
| Step 3 | |
## Step 1. Cluster Provisioning
### Activities Performed
To be completed by Skpr Platform Team
### Issues Identified
To be completed by Skpr Platform Team
## Step 2. Redeploy Project(s)
### Activities Performed
To be completed by Skpr Platform Team and Customer Application Development Team
### Issues Identified
To be completed by Skpr Platform Team and Customer Application Development Team
## Step 3. Review and Document
### Activities
To be completed by Customer Application Development Team and Customer Product Team.
### Issues Identified
A list of defects, resolutions and links to external updated documentation.
To be completed by Customer Application Development Team and Customer Product Team.