API Doc
Search Docs...
⌘ K
ABSAVE
  • ABS technical whitepaper

Data storage structure

The data storage structure is shown in the following diagram.

Block

A block is the basic unit used for data storage within LSM. The default block size is 256 KiB. Only valid data is marked in each data block at a granularity of 4 KiB. When exchanging data between the capacity tier and the cache tier, only the valid data in the block is exchanged. A data block (extent) consists of multiple blocks.

Data block

A data block (extent) is the basic unit managed by Meta. A volume is divided into multiple fixed-size data blocks, which are stored in the LSM. In ABS, processes like data recovery, migration, and redirect-on-write are all performed at the data block level. The logical size of a data block is fixed at 256 MiB, while the actual space it occupies depends on the redundancy policy, provisioning method, and the actual data size it carries.

The types of data blocks include VExtent, LExtent, and PExtent. The data block table records attribute information for data blocks corresponding to logical areas within the volume.

  • VExtent (Volume Extent)

    A volume data block represents a specific data area within a volume and defines how this data area is accessed, such as whether copy-on-write operations are performed. The VExtent is a resource exclusive to each volume and is not shared among other volumes. In the ABS system, VExtent identifies data blocks within a volume and, by establishing a mapping relationship with LExtent, indirectly links them to the underlying physically stored data blocks.

  • LExtent (Logical Extent)

    LExtent represents a data block with a size not exceeding 256 MiB. LExtent connects user I/O requests with the underlying physical storage. I/O requests from users are converted into access requests for LExtent. LExtent does not directly correspond to physical space; instead, it points to specific physical data blocks (PExtent) through its identifiers Lid and LEpoch.

  • PExtent (Physical Extent)

    PExtent, which refers to the data block actually stored on the physical disk. Based on the different storage locations, PExtent can be classified into the performance extent of the cache layer and the capacity extent of the capacity tier. These extents serve as the actual carriers of data blocks in the underlying physical storage. They can be distributed across different nodes independently and adopt different storage policies.

The relationship between VExtent, LExtent, and PExtent is illustrated in the following diagram:

VExtent is the logical partition unit of a volume, dividing user data into multiple fixed-size areas. LExtent associates user logical data with underlying physical data blocks. One LExtent may map to one or more VExtents (in cases of cloning or snapshot creation, different volumes' VExtents can point to the same LExtent). Each LExtent then maps through PExtent to the actual physical data blocks stored in the cache tier or capacity tier. Therefore, a logical-to-physical mapping relationship is formed between VExtent, LExtent, and PExtent, providing efficient data sharing and flexible storage management capabilities.

Striping

Striping refers to dividing continuous data into equal-sized stripes and distributing each stripe across different physical disks whenever possible, so as to fully utilize the I/O capabilities of multiple disks. In ABS, the stripe number can be set to 1, 2, 4, or 8 (default value). The stripe size is fixed at 256 KiB and cannot be changed. Striping can fully exploit the advantages of multiple physical disks working in parallel. The more stripes, the higher the degree of parallelism. Particularly, when the stripe number matches the average number of physical disks per node, the performance of sequential I/O improves significantly.

Volume

A volume is the most basic data structure provided by ABS for external access. It can correspond to an iSCSI LUN and serves as the basic unit for operations such as snapshotting, cloning, and rollback.

A volume consists of multiple fixed-size data blocks. Each data block ID contained in a storage block is recorded through a data block table. When a client performs an I/O operation, the system identifies the corresponding data block ID and the location information of the chunk in which the data block is located by referencing the data block table.

Since a volume is composed of multiple data blocks distributed across different nodes, the capacity of the volume can exceed the total storage capacity of a single physical server. Volumes support thin provisioning and online expansion, with a maximum capacity of 64 TiB. Volumes can be configured with parameters such as redundancy policy and striping to meet different storage needs.

Datastore

Datastore is a collection of volumes, which may correspond to an iSCSI target. Volumes within a datastore typically share the same redundancy policy and provisioning method. Each datastore belongs to a specific storage pool. All of the volume data created in this datastore is also stored in this storage pool.