Frequent changes occur in production clusters, such as server physical disk failures, server downtime, cluster expansion, or virtual machine migration. These changes may cause the original shard distribution to no longer be suitable for the new operational scenarios. To maintain optimal cluster performance, Meta periodically checks the status of clusters and initiates shard migration tasks. The main objectives include:
Data access localization
In a virtualized environment, virtual machines often migrate based on demand. When a virtual machine migrates, the previously localized access becomes remote access. To reduce the additional overhead caused by remote access, Meta triggers data migration to the local site after the virtual machine migration.
To avoid unnecessary migrations, Meta typically waits a while after virtual machine migration before triggering data migration. During migration, data with high access frequency is prioritized. If the target node for migration is already full, the migration will not be triggered.
Topology rebalancing
When the cluster topology changes due to expansion or other reasons, Meta readjusts the data distribution based on the latest topology information to ensure optimal data placement within the cluster and maintain cluster health.
Capacity balancing
When the cluster is under medium to high load (node storage utilization ≥ 75%), Meta periodically migrates non-local hot data from high-load nodes to other nodes. This prevents data concentration and reduces the risk of access bottlenecks.
ABS does not strive for absolute capacity balance but allows for a certain degree of imbalance. When the cluster is under low load (all node storage utilization < 75%), the capacity balancing policy is not triggered to avoid jitter caused by frequent data migration. Under high load conditions, Meta automatically increases the frequency of migration checks to restore balance as quickly as possible.