Why does corosync use two configuration files?

/etc/corosync/corosync.conf is read by the local corosync daemon. /etc/pve/corosync.conf is the master managed by pmxcfs and propagated to every node; it's the one PVE tools read (replication, migration). When the cluster is quorate, you edit only the master and pmxcfs syncs the local file. Without quorum, /etc/pve is read-only and you must edit the local file then copy it manually.

Reload or restart corosync after editing?

The live reload only applies the new configuration if config_version was incremented. When in doubt, systemctl restart corosync reloads the file regardless of version. Always increment config_version on each edit and restart corosync on all nodes.

Does ZFS replication follow the new ring automatically?

No. Replication and migration follow the node's primary IP, i.e. ring0_addr in the nodelist, not the corosync runtime state. To move replication onto the public IP, set that IP as ring0_addr, then restart pve-cluster so pmxcfs refreshes /etc/pve/.members; otherwise the old IP stays cached.

How do you edit the config once the cluster has lost quorum?

Without quorum, /etc/pve is mounted read-only. Force pvecm expected 1 to make the filesystem writable, edit the local config, then copy /etc/corosync/corosync.conf to the other node over the public IP via scp and restart corosync on both sides. Once the link is back, quorum returns and expected 1 is no longer needed.

Change / Add a Corosync Ring on a Proxmox Cluster

The setup: 2 OVH nodes, ring over the vRack

The starting architecture is a classic OVH one: two Proxmox servers interconnected by a vRack (OVH's private L2 network between servers). Corosync uses that vRack as its single ring (link 0), and ZFS replication between the two nodes runs over those same private IPs. The public interface carries no ring: it serves the VMs' traffic, not the cluster.

Before / after the vRack failure

Nominal

Ring 0 = vRack (private network)

ZFS replication over the vRack

Public interface = VM traffic only

Quorum: 2/2 nodes

vRack card failed

Ring 0 down on one node

Quorum lost → /etc/pve read-only

ZFS replication blocked

Public interface still up

Goal: add the public interface as a ring and bring ZFS replication onto it, to get the service back without waiting for the vRack card repair.

Two files, two readers: the basics

This is what wastes the most time. Corosync on Proxmox handles two distinct files, read by different components:

File	Read by	Role
/etc/corosync/corosync.conf	corosync daemon	Quorum, knet transport, rings
/etc/pve/corosync.conf	pmxcfs master + PVE tools	Source of truth, propagated to the cluster; `remote_node_ip`, replication, migration

Golden rule: as long as the cluster is quorate, you edit only /etc/pve/corosync.conf (the master) and pmxcfs automatically propagates it to each node's local file. Without quorum, /etc/pve is read-only: you must edit the local file and copy it by hand (see the "degraded mode" step).

The gotchas that cost you 2 hours

Replication / migration follows the primary IP = ring0_addr, not the corosync runtime state nor /etc/hosts.
The live reload only applies if config_version increases. When in doubt: restart.
After changing ring0_addr, pmxcfs keeps the old nodelist cached: you must restart pve-cluster to refresh /etc/pve/.members.
The migration network in datacenter.cfg only works if target IPs fit in a single CIDR. Public IPs in disjoint /24s → unusable, hence putting the desired network in ring0_addr.

Step 1 — Edit the configuration

Edit /etc/pve/corosync.conf if you are quorate, otherwise /etc/corosync/corosync.conf locally (see degraded mode below). Principles:

ring0_addr = primary link (the one replication will follow).
ring1_addr = backup link.
Declare one interface block per linknumber used (0 and 1).
Increment config_version on every edit.

Before picking the new version, check the one actually loaded by the daemon:

corosync-cmapctl -g totem.config_version

In our case, we go from a single ring (vRack) to two rings by adding the public IP. To move replication onto the public network, we set the public IP as ring0_addr and the vRack (still alive on the other node) as backup:

For reference, here is the starting configuration (single ring over the vRack), handy to compare:

Step 2 — Restart Corosync

The live reload is finicky; we prefer a full restart which reloads the file regardless of version. Do it on both nodes:

systemctl restart corosync          # on BOTH nodes
corosync-cfgtool -s                 # LINK 0 = primary, status: connected
pvecm status                        # Quorate: Yes, without 'expected 1'

Step 3 — Refresh the pmxcfs nodelist

Essential after changing ring0_addr: otherwise PVE tools keep the old IP cached and replication keeps targeting the dead vRack.

systemctl restart pve-cluster       # on BOTH nodes
cat /etc/pve/.members               # the "ip" fields must show the new primary

Then restart the other PVE services so they reload the topology:

systemctl restart pvestatd pvedaemon pveproxy
# if HA is enabled:
systemctl restart pve-ha-lrm pve-ha-crm

Step 4 — ZFS replication switches to the public IP

Why that's enough

Proxmox ZFS replication (pvesr) opens an SSH session to the target node's primary IP = its ring0_addr. By setting the public IP as ring0 and restarting pve-cluster, the zfs send/recv jobs resume automatically over the public network — without touching the job definitions.

If the node IPs changed, refresh the cluster SSH host keys, then force a job to validate:

pvecm updatecerts                       # refreshes the cluster ssh_known_hosts
pvesr run --id <vmid-job> --verbose     # ssh must target the new public IP

Degraded case: no quorum at all

If the primary link is dead on both sides (or the failure already dropped the quorum before you intervened),/etc/pve is read-only and cluster propagation no longer works. You must push the config by hand:

Once the new ring is established, quorum returns (no more expected 1). Then run steps 2 to 4 (restart corosync, pve-cluster and services) so that pmxcfs takes back control and replication resumes.

Final verification

Exit checklist

corosync-cfgtool -s: LINK 0 connected on the right network, LINK 1 listed.
pvecm status: Quorate: Yes, no leftover expected 1.
cat /etc/pve/.members: the IPs shown are the expected primary IPs.
pvesr run --id <job> --verbose: SSH targets the new IP and the job succeeds.

Network / firewall prerequisites

corosync / knet = UDP 5405 (up to 5412). Open it between node IPs, restricted to the peer (not the world) if going over public.
If node IPs changed: pvecm updatecerts to regenerate the ssh_known_hosts.
Running quorum over the Internet is a temporary degraded mode: variable latency, exposure. Keep it only while the vRack is being repaired, then move the private link back to ring0.

This scenario — an OVH node suddenly losing its vRack link — is exactly the kind of incident we handle on-call for our clients. We operate Proxmox clusters hosted on OVHcloud under management : ring failover, ZFS replication, follow-up on the hardware repair with the datacenter, and the return to nominal.

Frequently asked questions

Official documentation

Infrastructure

Disaster Recovery with Proxmox: multi-site

Ceph replication, PBS, RPO/RTO and DORA/NIS2 compliance.

Infrastructure

Our 3-2-1 backup strategy

PBS, deduplication, verify jobs and ransomware protection.

Technical Guide

Proxmox 8 to 9 migration

Hands-on experience and upgrade methodology in production.

Infrastructure

Public NTP on a Proxmox VM

Clock drift, Chrony vs ntpd, VM vs bare-metal.

A Proxmox cluster on OVHcloud to harden?

We design and operate redundant Proxmox clusters (multiple rings, ZFS replication, HA), and respond on-call to network and quorum failures — OVH vRack included. Let's talk about your infrastructure.

Managed Proxmox on OVHcloud Contact us

Change or Add a Corosync Ring on a Proxmox Cluster

The setup: 2 OVH nodes, ring over the vRack

Before / after the vRack failure

Nominal

vRack card failed

Two files, two readers: the basics

The gotchas that cost you 2 hours

Step 1 — Edit the configuration

Step 2 — Restart Corosync

Step 3 — Refresh the pmxcfs nodelist

Step 4 — ZFS replication switches to the public IP

Why that's enough

Degraded case: no quorum at all

Final verification

Exit checklist

Network / firewall prerequisites

Frequently asked questions

Official documentation

Related articles

Disaster Recovery with Proxmox: multi-site

Our 3-2-1 backup strategy

Proxmox 8 to 9 migration

Public NTP on a Proxmox VM

A Proxmox cluster on OVHcloud to harden?