Replication

Unlike traditional hardware RAID, ABS enhances data security and high availability through a software-implemented replication mechanism. Each data block has multiple complete data shards (i.e., replicas) in the cluster, which are stored across different ABS Chunk services. Even if a node fails, the availability of the data remains unaffected as long as other replicas are still accessible.

To ensure data consistency, each write operation is performed on all replicas simultaneously. Only after all replicas are successfully written does it return to the client, thereby ensuring strong data consistency. When a replica becomes abnormal, ABS records the abnormal replica and removes it from the list of valid replicas. At this point, strong data consistency means that all valid replicas are successfully written. After the abnormal replica is removed, it no longer participates in subsequent read or write actions. If the cluster has sufficient storage space, upon removing the abnormal replica, a temporary replica will be created to record all write requests of data after this removal. If the replica failure is temporary (for example, caused by a service restart, node restart, or network anomaly), after failback, the faulty replica and temporary replica will together form a complete set of data. This prevents a drop in data persistence due to temporary failures. After the data is restored to the specified replication factor, temporary replicas will be reclaimed to free up storage space.

The replication factor of a data block is determined by the redundancy policy of its associated volume, currently supporting configurations of 2 and 3 replicas. By default, a replication factor of three is used when there are at least five nodes available for storage services; otherwise, a replication factor of two is used. Different replication factors provide varying levels of data protection: With a replication factor of 3, up to 2 servers can fail simultaneously, while with a replication factor of 2, only 1 server can fail. Users can set an appropriate replication factor for different volumes based on their business needs.

ABS supports increasing the existing LUN replication factor, but does not support reducing it. After you increase the replication factor, the system automatically triggers the data recovery mechanism to restore data to the specified replication factor. When multiple volumes reference the same data block due to cloning, the replication factor for that data block is determined by the highest expected replication factor among all volumes that reference it.