DRP with Proxmox: Multi-Site Architecture, RPO/RTO & PBS
The Disaster Recovery Plan (DRP) is no longer a luxury reserved for large enterprises. With DORA and NIS2 regulatory obligations, and the constant threat of ransomware, every IT department must design a credible recovery strategy. Proxmox VE, combined with PBS and Ceph, provides a complete platform to build a high-performance and cost-effective multi-site DRP.
DRP: Definition and Challenges
The Disaster Recovery Plan (DRP) is the set of procedures and technical resources enabling IT operations to resume after a major disaster: fire, flood, cyberattack, catastrophic hardware failure, or critical human error.
Two fundamental metrics structure every DRP:
- RPO (Recovery Point Objective): the maximum amount of data you accept to lose. An RPO of 24h means you can lose up to one day of data.
- RTO (Recovery Time Objective): the maximum delay to bring services back online. An RTO of 4h means production must be restored in under 4 hours.
DRP vs BCP: Two Complementary Approaches
It is essential to distinguish the DRP from the BCP (Business Continuity Plan). The BCP aims to maintain operations without interruption through high availability (HA clustering, synchronous Ceph replication, automatic failover). The DRP steps in when the BCP has failed: it organizes reconstruction from backups or a standby site.
| Criteria | BCP | DRP |
|---|---|---|
| Objective | Zero interruption | Recovery after disaster |
| Proxmox Technologies | HA, synchronous Ceph, live migration | PBS, async replication, DR site |
| Typical RPO | ~0 (zero loss) | Minutes to hours |
| Typical RTO | Seconds | Minutes to hours |
| Cost | High (doubled infrastructure) | Moderate |
| Scope | Individual failures | Major disasters |
Regulatory Obligations: DORA, NIS2, ISO 22301
The European regulatory framework now imposes strict requirements for IT resilience:
- DORA (Digital Operational Resilience Act): applicable to the financial sector since January 2025, requires regular resilience testing, formalized IT risk management, and documented and tested recovery plans.
- NIS2 (Network and Information Security Directive): extends cybersecurity obligations to many sectors (energy, transport, healthcare, digital). Requires risk management measures including business continuity and crisis management.
- ISO 22301: international standard for business continuity management systems. A reference framework for structuring a BCP/DRP compliant with best practices.
Regulatory note
Since 2025, not having a documented and tested DRP exposes companies covered by DORA or NIS2 to significant financial penalties. Beyond compliance, a well-designed DRP protects the long-term viability of your business.
Multi-Site DRP Architecture with Proxmox
Proxmox VE provides all the building blocks needed to construct a robust multi-site DRP. The architecture relies on three key components: the production cluster, the disaster recovery (DR) site, and Proxmox Backup Server (PBS) as the cornerstone of the backup strategy.
DRP Architecture Diagram
PRIMARY SITE (Production) DR SITE (Standby)
================================ ================================
| Proxmox VE Cluster (3 nodes) | | Proxmox VE Cluster (2 nodes) |
| - Production VMs | | - Standby VMs |
| - Ceph storage (replicated x3)| | - Ceph storage (replicated x2)|
| - HA manager active | | - HA manager ready |
================================ ================================
| | ^ ^
| Application | | |
| replication |________________________| |
| (MariaDB...) |
| |
v |
================================ |
| Local PBS | PBS Sync (pull mode) |
| - Daily backup | --------------------------->|
| - Deduplication | |
| - Verify jobs | ========================
| - 30-day retention | | Remote PBS |
================================ | - Off-site copy |
| | - AES-256 encryption |
v | - 90-day retention |
================================ ========================
| Air-Gapped PBS (Nimbus) |
| - Disconnected disks |
| - Weekly rotation |
| - Ransomware protection |
================================
Complete DRP architecture: application replication between sites + multi-tier PBS backups
Application Replication vs Ceph Stretch Cluster
For inter-site replication, two approaches stand out depending on the criticality level:
Application Replication (recommended)
- MariaDB/MySQL replication, PostgreSQL streaming, rsync...
- Simple to implement and maintain
- Works over standard WAN links
- Configurable RPO (seconds to minutes)
- Moderate cost, no latency constraints
- Manual or semi-automatic failover
Ceph Stretch Cluster
- Synchronous replication between 2 sites
- RPO = 0 (zero data loss)
- Automatic failover
- High complexity (inter-site Ceph)
- Requires a 3rd site (monitor tiebreaker)
- High cost (bandwidth, latency < 10ms)
For the majority of SMBs and mid-market companies, application replication (MariaDB, PostgreSQL...) combined with PBS offers the best trade-off between simplicity, cost, and DRP effectiveness. The Ceph stretch cluster is reserved for the rare critical environments (finance, healthcare) where RPO = 0 is an absolute requirement — its operational complexity is not justified for most businesses.
PBS: Central Component of the DRP
Proxmox Backup Server plays a central role in any DRP architecture. Unlike Ceph replication which protects against hardware failures, PBS protects against logical corruption: human errors, ransomware, application bugs. It is the defense layer that allows you to roll back to a previous healthy state.
To learn more about PBS backup strategy, see our dedicated article: Proxmox 3-2-1 Backup Strategy
RPO/RTO: What Can You Achieve with Proxmox?
Achievable recovery objectives depend directly on the deployed architecture. Here is a realistic comparison of the three main approaches:
| Architecture | RPO | RTO | Relative Cost | Use Case |
|---|---|---|---|---|
| PBS only (daily backup) | 24h | 2 - 4h | $ | SMBs, non-critical applications |
| PBS + 2x/day backup | 12h | 2 - 4h | $ | SMBs with evolving data |
| Application replication + PBS | ~few min | 30 min - 1h | $ | Best trade-off |
| Ceph stretch cluster (synchronous) | ~0 | < 5 min | $$$ | Finance, healthcare, critical |
Our recommendation: Application replication + PBS
For the majority of businesses, we recommend the application replication + PBS combination. MariaDB, PostgreSQL, or rsync replication between two sites is more than enough to ensure critical data continuity (RPO of a few minutes). PBS provides the additional protection layer against logical corruption with long retention and verified backups. This approach is simple to implement, cost-effective, and reliable — it covers both hardware failures and logical disasters (ransomware, human error) without the complexity of an inter-site Ceph infrastructure.
To outsource your DRP backups, discover NimbusBackup for your Proxmox DRP : we offer Hosted PBS solutions for your Proxmox DRP with multi-site replication and end-to-end encryption.
For a detailed cost analysis of Proxmox infrastructure compared to VMware, see our TCO VMware vs Proxmox 2026 comparison
PBS: Your Best DRP Ally
Proxmox Backup Server is much more than a simple backup tool. It is an enterprise-grade solution that natively integrates essential features for a reliable DRP:
Deduplication and Efficiency
- Chunk-level deduplication: 60 to 90% storage space reduction
- Incremental backups: only modified blocks are transferred
- Native compression: optimized storage and bandwidth usage
Security and Verification
- Verify Jobs : automatic integrity verification of every backup
- Client-side AES-256-GCM encryption: data is encrypted before transfer
- Sync jobs: PBS-to-PBS replication for off-site copies
Off-Site PBS with Nimbus
For the off-site layer of your DRP, RDEM Systems offers Nimbus, our range of external backup solutions:
- Nimbus Double Drive PBS : two mirrored disks in a remote datacenter for complete redundancy of your PBS backups
- Nimbus Air Gapped PBS : physically disconnected disks in rotation, maximum protection against ransomware and account compromises
Discover all our backup solutions at nimbus.rdem-systems.com .
Testing Your DRP: The Key to Reliability
An untested DRP is a DRP that will fail. It is a statistical certainty. Regular testing validates that procedures work, that RTOs are achievable, and that teams know how to respond in a crisis situation.
DRP Testing Methodology
- 1Documentation review (monthly): review of procedures, verification of emergency contacts, update of VM inventories and restoration priorities.
- 2Partial technical test (quarterly): restoring individual VMs from PBS on an isolated network. Verifying boot and application functionality. Measuring actual restoration time.
- 3Full failover test (semi-annual): activating the DR site, restoring all critical services, business validation by functional teams. Measuring actual RTO and RPO.
- 4Post-mortem (after each test): documenting gaps between objectives and actual results, corrective action plan, procedure updates.
DRP Validation Checklist
- PBS backups are intact (verify jobs OK)
- Restored VMs boot correctly on the DR site
- Business applications are functional after restoration
- Measured RTO is less than or equal to the target
- Measured RPO meets expectations
- Network access (DNS, VPN, firewall) is operational on the DR site
- Teams know the procedures and emergency contacts
- DRP documentation is up to date and accessible outside the production site
DRP and Ransomware: Advanced Protection
Ransomware is today the number one threat to IT infrastructure. An effective DRP must include specific anti-ransomware measures, as a sophisticated attacker will seek to compromise backups before triggering encryption.
The 4 Pillars of Protection
1. Air Gap
Backup copies on physically disconnected media from the network. Even an attacker with administrative access cannot reach a disk that is not plugged in. This is the ultimate protection.
2. Immutability
PBS backups can be protected against deletion and modification through strict retention policies and separate credentials. The PBS datastore is only accessible in append-only write mode from the hypervisors.
3. Encryption
PBS supports client-side AES-256-GCM encryption. Data is encrypted before leaving the hypervisor. Even if the PBS server is compromised, the data remains unreadable without the encryption key.
4. Access Separation
Backup access credentials must be strictly separated from production credentials. A Proxmox admin account should not be able to delete PBS backups. The principle of least privilege applied rigorously.
Ransomware alert
Modern attackers spend an average of 21 days in the system before triggering encryption. During this period, they identify and compromise backups. This is why long retention (90 days minimum) and air-gapped copies are essential: they allow restoring a healthy state prior to the compromise.
PBS Tape: Long-Term Archival
For businesses with regulatory archival obligations (10 years in the financial sector), PBS supports export to magnetic tapes. Tapes offer very low-cost storage, 30+ years durability, and native air-gap protection (tapes are physically removable and can be stored in a safe).
Our DRP Approach at RDEM Systems
At RDEM Systems, we support businesses in the design, implementation, and ongoing maintenance of their Proxmox DRP. Our approach stands out through its pragmatism and adaptation to each client's actual budget and constraints.
- Resilience audit: analysis of your existing infrastructure, identification of critical VMs, definition of target RPO/RTO per business service
- Custom DRP architecture: designing a multi-site architecture tailored to your constraints (budget, regulatory, geographic). Choice between Ceph replication, off-site PBS, or a hybrid approach
- Implementation and documentation: deploying the DRP architecture, configuring PBS backups and replication, writing recovery procedures
- Quarterly DRP tests: executing restoration tests, measuring actual RPO/RTO, compliance reporting and improvement plan
- Monitoring and alerting: continuous backup monitoring, alerts on failures, proactive backup integrity verification
Our DRP integrates into our comprehensive Proxmox managed services offering and benefits from our sovereign infrastructure operated from France. For off-site backup needs, our Nimbus range covers all protection levels, from standard backup to air-gapped.
For comprehensive support, discover our 24/7 managed services and on-call support for your DRP : monitoring, failover testing, and support in case of disaster.
Check our pricing or contact us for a free resilience audit.
If you are considering a migration from VMware, our VMware to Proxmox migration guide integrates DRP considerations from the start.
Frequently Asked Questions
Official Documentation
To dive deeper into the concepts covered in this article, consult the official documentation: