ABS snapshots utilize a combination of redirect-on-write (ROW) and copy-on-write (COW) technologies to achieve second-level snapshot capabilities.
The data structure for snapshot metadata is similar to that of volume metadata, as both contain a table of data blocks. The main difference between snapshots and volumes is that the metadata and data of a snapshot cannot be modified.
Data block table example:
Using volume A as an example, its data block representation is shown as follows:
| VExtent | LExtent | Share |
|---|---|---|
| 0 | 0 | 0 |
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 3 | 3 | 0 |
In this table:
When taking a snapshot of a volume, Meta duplicates the data block table of this volume to create a new snapshot's data block table. The contents of this snapshot are preserved through metadata replication. For existing volumes, you need to add a Share mark bit for each data block in the data block table. This indicates that when the content of the data block needs modification, a copy-on-write operation should be executed.
Here is the data block table for volume A:
| VExtent | LExtent | Share |
|---|---|---|
| 0 | 0 | 1 |
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
Here is the data block table for snapshot A:
| VExtent | LExtent | Share |
|---|---|---|
| 0 | 0 | 1 |
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
Copy-on-write operations are divided into the Meta end and the Chunk end, corresponding to the metadata ROW mode and data COW mode respectively.
Assuming VExtent1 has the Share mark bit, a new LExtent 4 must be allocated on the Meta end before modifying VExtent1's content. In this case, ROW mode is used.
The data block table of Volume A before VExtent 1 is written and the data block table of Volume A after VExtent 1 is written are shown below:
| VExtent | LExtent | Share | VExtent | LExtent | Share | |
|---|---|---|---|---|---|---|
| 0 | 0 | 1 | 0 | 0 | 1 | |
| 1 | 1 | 1 | -> | 1 | 4 | 0 |
| 2 | 2 | 1 | 2 | 2 | 1 | |
| 3 | 3 | 1 | 3 | 3 | 1 |
LExtent 4 allocates a new child PExtent and points the parent of the child PExtent to the PExtent of LExtent 1 (referred to as the parent PExtent). On the Chunk (LSM) end, a child PExtent is also generated. The Chunk end maintains a mapping between a PExtent and a set of blocks (data shards on physical disks, typically ranging from 16 to 256 KiB). It also marks both the child PExtent and parent PExtent to hold the blocks in either shared or read-only mode.
When the Chunk takes a snapshot or clone of a child PExtent's corresponding block and receives an initial write request, if the request does not align with the block boundaries, it first slices the request according to the block's aligned boundaries into multiple requests, each being processed separately. In the sliced requests, if the request data is less than one block, it reads the data from the parent block and combines it with the current request's data to form a new block. This new block is then written to the physical disk, and the extent's block reference relationship is updated. If the sliced request is exactly the size of a block, then it is unnecessary to read any data from the block held previously in shared mode. Data can be written directly to the new block interface. Once a block has executed a copy-on-write operation, subsequent read and write operations on this block are consistent with normal read and write operations. Therefore, snapshots do not impact the write performance of the volume. This is the block-granularity COW mode.