ABS technical whitepaper

Basic concepts

This chapter explains some of the technical terms used in the documentation. If you encounter unfamiliar terms while reading, you may find their definitions and detailed explanations here.

Distributed block storage

Distributed block storage is a software-defined storage system that achieves high performance and reliability by splitting block-formatted data (LUN) into smaller data units and distributing them across multiple nodes. Unlike traditional centralized storage, distributed block storage uses a distributed architecture to store data shards across multiple nodes. This design ensures continuous and stable storage services even in the event of hardware, network, or node failures. Distributed block storage has strong scalability. It meets growing data demands by adding physical disks and nodes. It supports low-latency, high-throughput access and is widely applicable in areas such as cloud computing, big data processing, and AI training, offering flexible and efficient storage solutions for virtual machines, databases, and containerized applications.

Data shard

In distributed storage systems, depending on the redundancy policy adopted by the volume, data blocks may be replicated (replication policy) or split into multiple parts (erasure coding policy) and stored on different physical nodes. Each part is referred to as a data shard.

COW (Copy-On-Write)

Copy-on-write is a technology that creates a data replica only when a write operation is performed. The core idea is not to create a data replica immediately before modifying data. Instead, a new replica is generated only when actual data modification occurs. For example, when taking a snapshot of a volume, the system does not immediately copy the original data, but only records its metadata information. When a write operation occurs, the unchanged portion of the original data remains intact. After making a copy, it combines with the newly written data and is stored in a separate location. This mechanism delays replication operations, reducing unnecessary data duplication. While ensuring data consistency, it significantly saves storage space and enhances the efficiency and flexibility of the storage system.

ROW (Redirect-On-Write)

Redirect-on-write, a data writing technique used to optimize storage system performance and resource utilization. Unlike COW, ROW does not duplicate the original data block during a write operation. Instead, it writes the new data to a new physical storage location and updates the corresponding metadata to point to the new location. This mechanism ensures the original data remains unchanged, providing features such as data protection and rapid rollback.

Log replay

Log replay is an essential mechanism in file systems and storage engines to ensure data consistency. Before or during operations such as writing or creation, the system records a log entry to describe the specific content of the update. When unexpected events like power outages or other system crashes occur, incomplete updates may leave data in an uncertain state. When the system returns to normal, by utilizing the logged operations, the system can redo pending updates or cancel update actions to ensure data consistency.

Active-active clustering

Active-active clustering is a data protection and disaster recovery mechanism, offering site fault tolerance capabilities with a zero RPO (Recovery Point Objective) and minute-level RTO (Recovery Time Objective).

An active-active cluster is deployed in a stretched architecture, consisting of two availability zones and one witness node. The two availability zones and the witness node communicate with one another through networks. If one availability zone fails, the other can continue to provide storage services, thereby enabling disaster recovery at the availability zone level.

Availability zone

The two availability zones are typically located in the same geographical area, at different sites, with one serving as the primary availability zone and the other as the secondary availability zone. Each availability zone contains at least two nodes and can independently provide storage services. When the active-active cluster operates normally, the client primarily accesses storage services through the primary availability zone.

When both availability zones operate normally, the active-active cluster retains a replica of each data block in both availability zones. This ensures that if one availability zone fails, the client can access complete data from the other surviving availability zone to restore service.

Primary availability zone

The availability zone where the business primarily operates under normal conditions. With the primary availability zone active, the metadata management role in the active-active cluster runs in the primary availability zone. Services in the primary availability zone will achieve optimal performance.
Secondary availability zone

Typically used as a backup availability zone. If a failure occurs in the primary availability zone, the services will be restored in the secondary availability zone.

Witness node

A witness node is a special node in an active-active cluster to ensure the proper operation of cluster services and to enable services to be automatically switched between the primary availability zone and secondary availability zone. The witness node participates only in cluster voting and retains a subset of metadata. It does not store any user data.

The witness node can be either a physical machine or a virtual machine and must be deployed in a separate physical fault domain from the primary and secondary availability zones. Generally, the witness node, the primary availability zone, and the secondary availability zone are located in different data centers. However, if the witness node and the availability zone are in the same fault domain, such as the same IDC room, the active-active cluster will not automatically recover services if this IDC goes offline.

SSD (Solid State Disk/Drive)

Solid state disk is a storage device that uses flash memory as a permanent storage element.

HDD (Hard Disk Drive)

Hard disk drive.

iSCSI

iSCSI (Internet Small Computer System Interface) is a network storage protocol developed from the SCSI protocol. It encapsulates the SCSI protocol in IP packets and transmits them over an IP network, allowing clients to access storage devices on remote servers locally. This greatly enhances the accessibility and flexibility of storage devices. iSCSI leverages the existing IP network architecture, which reduces the costs of building and maintaining a storage network. It provides performance comparable to directly connected SCSI devices, offering clients efficient storage access services. iSCSI is particularly suitable for environments that require remote storage access, flexible expansion of storage capacity, and simplified storage management.

iSCSI initiator

iSCSI initiator is the software or hardware device installed on a client host, responsible for initiating connection and data transmission with the iSCSI target. An iSCSI initiator can either be a software application installed in the operating system or a dedicated hardware integrated into the network adapter. The iSCSI initiator sends requests over an IP network to the iSCSI target to read and write data, enabling clients to access remote storage resources as if they were accessing local storage devices.

iSCSI target

The iSCSI target is a storage device that an iSCSI initiator accesses, responding to the initiator's storage access requests. The iSCSI target virtualizes storage resources into logical units (LUNs) and provides them for use by remote clients. For the initiator, the LUN provided by the iSCSI target appears as a locally connected storage device. ABS cluster supports iSCSI initiator accessing iSCSI targets through a unified access virtual IP, eliminating the need to configure different IP addresses for each iSCSI target.

iSCSI LUN

The iSCSI LUN (iSCSI Logical Unit Number) uniquely identifies a logical unit on a storage device within iSCSI storage services. LUN is a logical abstraction that corresponds to a volume in ABS and appears as a virtual disk on the client. The iSCSI initiator communicates with an iSCSI target via the iSCSI protocol to request access to a specific LUN. Upon receiving the request, the iSCSI target routes it to the corresponding storage resource based on the LUN identifier and returns the response to the initiator, allowing the iSCSI initiator to use the ABS block storage as if it were locally attached.

NFS

NFS (Network File System) is a distributed file system protocol used to share files over a network. The NFS protocol allows clients to access files and directories on a remote server as if they were accessing local files, using standard network protocols (TCP/IP). NFS supports file reading, writing, and sharing with high efficiency and compatibility, and it is widely supported across various operating systems.

vhost

The vhost is a high-performance virtualization I/O protocol that establishes a direct communication path between virtual machines (Guest OS), QEMU, and storage services (such as ABS) by using shared memory. This reduces performance overhead from the virtualization layer and network transmission. In Boost mode of ACOS, the vhost protocol allows storage services to directly access virtual machine memory, bypassing the QEMU main thread and traditional network protocol stack. This improves I/O processing capabilities, reduces latency, and achieves more efficient virtualization I/O performance.

Volume pinning

Volume pinning is a storage optimization strategy available in clusters deployed with a tiered architecture.

By default, in clusters with tiered storage, data in the volume with higher access frequency will be kept in the faster cache, while data with a lower access frequency will sink the slower data layer. After enabling volume pinning mode for the volume, the data in the volume will always remain in the cache tier, ensuring consistently high performance.