Basic concepts

Before installing and deploying Arcfra Cloud Operating System (ACOS), you should first familiarize yourself with the relevant concepts. This will help you better understand the installation and deployment process, as well as the functionalities of ACOS.

SCVM

Storage Controller Virtual Machine (SCVM), is a virtual machine used to install ACOS when ACOS is deployed with VMware ESXi in a hyperconverged form. It manages physical disks on the hosts, including SSDs and HDDs, which are then centrally managed by Arcfra Block Storage (ABS). SCVMs can be organized into a distributed storage cluster over the network, providing storage services to virtual machines on VMware ESXi hosts.

Hyperconvergence

Hyperconvergence is an IT infrastructure approach that streamlines the deployment, management, and maintenance of resources by integrating compute virtualization, distributed storage, and networking.

VMware ESXi

ESXi is a key component of vSphere, VMware's server virtualization solution (VMware is now part of Broadcom). It can be independently installed and run on physical servers and serves as VMware's bare-metal hypervisor. VMware ESXi specifically refers to the virtualization platform that runs the ESXi program.

ACOS active-active cluster

An ACOS cluster with the active-active feature enabled is referred to as an ACOS active-active cluster. An ACOS active-active cluster is deployed in a stretched architecture, consisting of two availability zones and one witness node. The two availability zones are typically located in two sites within the same city, with one serving as the primary availability zone and the other as the secondary availability zone. Each availability zone contains at least three nodes and can independently provide compute and storage services. When the cluster operates normally, workloads mainly run in the primary availability zone.

To ensure the proper operation of services that require leader election, and to enable automatic service switchover to the secondary availability zone in the event of a primary availability zone failure, you need to deploy an additional witness node, which will only participate in leader election and will not store any data. The witness node can be either a physical machine or a virtual machine and must be deployed in a separate physical fault domain from the primary and secondary availability zones.

The availability zones and the witness node communicate through networks. If one zone fails, the other will take place to provide services, ensuring disaster recovery at the zone level.

Master node

In an ACOS cluster, a master node is a node that runs metadata services.

Storage node

In an ACOS cluster, a storage node is a node that does not run metadata services.

Replication factor

The replication factor defines how many replicas of your data are stored. This redundancy policy helps improve data reliability and security by distributing replicas across multiple hosts.

Erasure coding

Erasure coding is a technique that calculates M parity blocks from K original data blocks using specific algorithms. In the event of a node or disk failure, a total of K data blocks and parity blocks are used to rebuild data, thus providing the corresponding fault tolerance capabilities. Erasure coding is available only when the cluster has tiered storage enabled.

Capacity specification

ACOS offers two capacity specifications based on physical disk limits. The normal capacity specification supports hosts with disks up to 128 TB, while the large capacity specification supports disks up to 256 TB.

Tiered storage

Tiered storage separates storage into two layers: cache and capacity. The capacity layer holds all data across nodes, storing cold data using replicas or erasure codes based on redundancy settings.

When the node is in a hybrid-flash or an all-flash configuration with multiple types of SSDs, the cache disk and the data disk will be deployed separately, with the high-speed media for caching and the low-speed media for capacity. In this case, the cache layer will be divided into the write cache and the read cache. The write cache is used to store newly written data, while the read cache is used to store frequently accessed data that has been previously tiered to the capacity layer. The capacity ratio of the write cache to the read cache is 8:2. Data is first stored in the write cache in the form of replicas and is then tiered to the capacity layer once the data turns cold.

When all nodes are in all-flash configurations with a single type of SSDs, cache and data will share all physical disks. Part of each physical disk's capacity is used for caching, and the capacity ratio of the cache partition to the data partition is 1:9. In this case, the cache layer consists only of the write cache, which is used to store newly written data. The replicated volumes will only utilize the capacity layer and not the cache layer.

Non-tiered storage

When non-tiered storage is enabled, there will be no cache layer. Apart from the physical disks containing the system partitions, all remaining physical disks will be used as data disks. Nodes can only be in all-flash configurations.

Boost mode

Designed for higher performance, the Boost mode in ACOS enables memory sharing between the Guest OS, QEMU, and ABS through the vhost protocol, which can improve the I/O performance of virtual machines.

RDMA

RDMA (Remote Direct Memory Access) is a direct memory access technology that allows data transfers between nodes to bypass the OS kernels of both parties, enabling direct access to their memory over the network. In this way, RDMA significantly saves CPU resources, improves system throughput, and reduces network latency. ACOS supports RDMA to minimize write latency between remote nodes, enhancing both latency and throughput for optimal cluster performance.

SR-IOV

Single Root I/O Virtualization (SR-IOV) is a hardware-based virtualization solution that enhances performance and scalability.

SR-IOV allows efficient sharing of PCIe (Peripheral Component Interconnect Express) devices among virtual machines. Since it is implemented in hardware, it can achieve I/O performance comparable to that of hosts.
A PCIe device with SR-IOV enabled, along with appropriate hardware and OS support, can appear as multiple separate physical devices, allowing multiple virtual machines to share a single I/O resource.

In ACOS, the SR-IOV passthrough feature virtualizes a single SR-IOV-capable port into multiple virtual NICs, which can be assigned to virtual machines, thereby significantly reducing network latency.

Volume pinning

ACOS supports volume pinning, a feature that can keep data of virtual volumes (with AVE virtualization) or NFS files (with VMware ESXi virtualization) in the cache layer to achieve more stable and higher performance.

LACP dynamic link aggregation

LACP (Link Aggregation Control Protocol) is a protocol based on the IEEE 802.3ad standard that implements dynamic link aggregation and disaggregation. It is one of the common protocols for link aggregation. In a link aggregation group, LACP-enabled member ports interact by exchanging LACP PDUs (Protocol Data Units) to explicitly agree on which ports can send and receive packets, ultimately determining the links that will carry the application traffic. In addition, when the aggregation conditions change, such as when a link fails, the LACP mode will automatically adjust the links in the aggregation group, allowing other available member links in the group to take over the failed link, thereby maintaining load balancing. Therefore, LACP can also increase the logical bandwidth between devices and improve network reliability without the need for hardware upgrades.

VAAI-NAS plugin

VAAI-NAS is a plugin from Arcfra that allows for thick provisioning and fast cloning of files when ACOS is integrated with VMware ESXi in a hyper-converged architecture.

Deployment control node

When deploying an active-active cluster, you need to access the management IP address of a node in either the primary availability zone or the secondary availability zone through a browser to enter the cluster deployment page. This node serves as the control node during cluster deployment, and is therefore referred to as the deployment control node.