ABS technical whitepaper>
Data management mechanism

Space allocation policy

In ABS, data blocks are split into several data shards based on the redundancy policy when stored. For instance, in a replication strategy, this could be a complete replica, or in an erasure coding strategy, it could be data shards or parity shards. The placement rules for these data shards are referred to as the space allocation policy.

To ensure efficient I/O, Meta dynamically optimizes space allocation policy by collecting cluster information. When determining the location for storing data shards, the following factors are considered:

Local priority

In ABS's design, the data link from the client to the volume is usually stable. Therefore, Meta attempts to assign one of the data shards related to the volume's data block to the LSM on the physical host connected to the data link of the volume. This strategy improves access performance by reducing network hops and alleviates network pressure on the storage service.
Topology security

Distribute data shards across different branches of the physical topology to minimize damage from physical failures.
Localization

Data from individual volumes should be centrally stored rather than distributed across the entire cluster. The data of different volumes should utilize different physical spaces as much as possible to reduce conflicts in physical resource usage. This approach improves the linear scalability of the overall cluster performance and narrows the impact range of node failures.
Load balancing

Ensure all nodes handle approximately the same capacity to balance access load and minimize losses if a node fails.
Access mode awareness

Identify the volume's access pattern through various methods. For example, multi-point access (multiple clients accessing shared volumes, etc.). Adjust data shard locations for different access patterns.
Dynamic data migration

When the access point changes, the corresponding data shard also moves to achieve data locality.
Dynamic adjustment of the shard allocation policy

Data shard allocation is not coordinated among multiple targets. For example, local priority leads to data concentration, while load balancing aims to distribute data across different nodes. ABS adjusts the priority of different targets and applies various strategies based on the current node space usage ratio in the cluster.

Local priority, topology security, localization, and load balancing are the fundamental objectives of data shard allocation. Among these, topology security is the highest priority, but it is not a mandatory requirement. The system prioritizes distributing data shards across different topology branches to ensure topology safety. If these conditions are not met, the system will temporarily adopt a non-secure allocation policy and automatically rectify the situation as circumstances permit.
Local priority and localized allocation do not directly conflict and can generally be processed in parallel. However, in some cases (such as data being concentrated on a few nodes), they may conflict with the goal of capacity balancing. ABS dynamically adjusts the priorities among local priority, localized allocation, and capacity balancing according to the current load status of the cluster to ensure a balance between system performance and resource allocation.

Regardless of whether using replication or erasure coding as the redundancy policy, ABS ensures that each node in the cluster stores at most one shard of a PExtent. ABS selects an appropriate shard allocation policy based on the cluster load (using the most heavily loaded node in the cluster as the reference) and the load of the PExtent's preferred local node (the preferred local node for placement).

Using the allocated capacity ratio p of the capacity tier on a node as the metric, node load is classified into the following levels: Low load (p < 75%), Medium load (75% ≤ p < 85%) and High load (p ≥ 85%).

The priorities for specific shard allocation policy are as follows:

Low cluster load

The allocation policy priority is: topology security > local priority > localization
Cluster not under low load and preferred local node under medium load

The allocation policy priority is: topology security > local priority > capacity balancing
Cluster not under low load and preferred local node under high load

The allocation policy priority is: topology security > capacity balancing

The explanation for the above priority principles is as follows:

Topology security

Regardless of the load of the cluster and the preferred local node, the shard allocation policy always prioritizes topology safety. If topology safety cannot be satisfied temporarily, allocation is allowed first, and subsequent migration is used to restore it.
Local priority

If the preferred local node is in a healthy state and has sufficient remaining space, shard allocation is prioritized to the preferred local node.
Localization

If the cluster topology and node health status remain unchanged and sufficient space is available, PExtent shards with the same preferred local setting should be allocated consistently.
Capacity balancing

If multiple candidate nodes have the same topology level, the shards will preferentially be allocated to the node with a lower load to achieve overall load balancing for the cluster.

The smallest storage unit that ABS offers externally is the volume. However, the preferred local node for PExtent within the same volume may not always be the same. For example, when cloning a volume, the PExtent shared with the source volume may be allocated to a new node after copy-on-write (COW), forming a new preferred local node. When the preferred local field in the PExtent data block table is set to 0, or when the preferred local node is in an unhealthy state, assigning shards to the preferred local node becomes meaningless. Therefore, the local priority policy is ignored, and shards are allocated based on the overall cluster load instead.

Information:

To prevent node capacity from repeatedly approaching the policy thresholds, thereby triggering frequent priority adjustments and data migrations, the system sets a default tolerance threshold of 5%. For example, under low load conditions, localized migration ensures that node capacity does not exceed the lower limit of the medium load range minus 5% (for the capacity tier, this default value is 70%).

The cache tier and capacity tier allocate space according to their respective space ratios, using different state transition thresholds.

An active-active cluster only supports replication for redundancy policy, and its replica allocation policy differs slightly. Refer to Replication mechanism of the active-active cluster for details.