Search Docs...

⌘ K

Overview Deployment Management Operation Reference Glossary

ACOS 6.3.0

Acrfra Cloud Operation System cluster>
Failure scenarios>
Active-active cluster failures

Network failures

Network disconnection

Failure scenario	Active-active cluster processing methods and impacts
IDC A storage network interruption (unable to connect IDC B and witness node C), while IDC B and witness node C are connected normally.	The handling logic for this scenario is consistent with the logic described in the "Failure of all nodes in IDC A" scenario within the Node failures section.
IDC B storage network interruption (unable to connect IDC A and witness node C), while IDC A and witness node C are connected normally.	The handling logic for this scenario is consistent with the logic described in the "Node failure in IDC B" scenario within the Node failures section.
Storage network interruption between witness node C and IDC A, as well as IDC B, but the network between IDC A and IDC B is functioning normally.	The handling logic for this scenario is consistent with the logic described in the "Failure of witness node C" scenario within the Node failures section.
Storage network disruption between IDC A and IDC B, but witness node C maintains normal connection with IDC A and IDC B.	Cluster primary availability zone (IDC A) can provide storage services normally. Virtual machine I/O on all nodes in IDC A experiences a brief high-latency event (usually in milliseconds, with a maximum delay not exceeding 7 seconds). Virtual machines on IDC B trigger HA and are rebuilt in IDC A when resources are sufficient and placement group rules allow. For data previously operating in IDC B and retaining a single replication in IDC A, the system will attempt to increase the replication factor to 2 within IDC A.
Loss of network connection between witness node C and either IDC A or IDC B, while connection between IDC A and IDC B remains normal.	Active-active cluster functions remain unaffected; storage services operate normally. Virtual machine I/O on all nodes is normal. No data recovery or replica adjustment is triggered.
All storage network connections between witness node C, IDC A, and IDC B are simultaneously disconnected.	ZooKeeper service cannot complete leader election. Cluster cannot provide storage services normally. Virtual machine I/O on all nodes is interrupted.

High network latency

Failure scenario	Active-active cluster processing methods and impacts
Network latency between IDC A and IDC B increases (10 ms < Ping delay < 1 s), but network latency between witness node C and IDC A and B is normal.	The cluster can continue to operate, but I/O performance—especially write performance—degrades significantly, and operation response latency increases. When ping latency reaches the hundreds of milliseconds range, latency-sensitive applications in virtual machines may behave abnormally or stop functioning.
Network congestion between IDC A and IDC B is critical (1 s < Ping delay < 7 s), but network delay between witness node C and IDC A and IDC B is normal.	A large number of nodes in IDC B enter an intermittent abnormal state. Some virtual machines may trigger HA and migrate to IDC A. The cluster may encounter situations where data recovery is required but cannot be completed. I/O on some virtual machines in IDC A may become extremely slow.
Network congestion between IDC A and IDC B is critical (Ping delay > 7 s) or a network disruption occurs, but network delay between witness node C and IDC A and IDC B is normal.	The availability zone hosting the ZooKeeper Leader (in most cases, IDC A) remains available, while nodes in the other availability zone become disconnected. If the ZooKeeper Leader is located on the witness node C, IDC A remains available and IDC B is disconnected. Virtual machines write data only within IDC A, and I/O latency returns to normal levels. If the ZooKeeper Leader is located in the secondary availability zone IDC B, ZooKeeper does not trigger migration. IDC B becomes disconnected, and virtual machines in IDC B trigger HA and migrate to the primary availability zone IDC A.
High network latency exists between witness node C and both IDC A and IDC B (Ping latency > 1 s), while network between IDC A and IDC B remains normal.	This scenario is equivalent to a failure of the witness node. Virtual machines in both the primary availability zone and the secondary availability zone can continue to operate normally, but the active-active feature is unavailable. If one availability zone fails, the other availability zone cannot take over the workloads in a timely manner.

In this article

Network disconnection
High network latency