| Main type | Subtype and description |
|---|---|
| Unhealthy disk | The physical disk is damaged. |
Read or write operations time out or receive no response from the physical disk. |
|
The physical disk experiences an I/O block. |
|
| Failing disk |
The physical disk shows increased I/O latency, which has not yet reached a timeout state. |
Errors occur during read or write operations on the physical disk (fewer than 100 times). |
|
Errors occur during the physical disk verification (fewer than 100 times). |
|
| Disk failing the S.M.A.R.T. test | The S.M.A.R.T. test failed. |
| Software RAID failure | The physical disk in a software RAID group experiences read or write errors or high latency, and has been marked as a failed component by the software RAID. |
| Short lifespan | The physical disk shows no signs of read or write timeout, high I/O latency, or damage, but is determined by the system to have an insufficient lifespan and may soon pose a risk. |
When any of the following alerts appear on the main AOC Alert page, it indicates a physical disk failure on the cluster node. Here, the placeholder {XXXX} represents the actual information displayed by the system. Follow the alert message for further actions.
| Alert message | Default alert level |
|---|---|
|
Host/SCVM |
Critical |
|
Host/SCVM |
Critical |
|
Host/SCVM |
Critical |
|
Host/SCVM |
Critical |
|
The physical disk |
Critical |
Hardware fault: the physical disk |
Critical |
Hardware fault: the physical disk |
Critical |
Hardware fault: the physical disk |
Critical |
Host/SCVM |
Notice |
Host/SCVM |
Notice |
|
Host/SCVM |
Notice |
|
Host/SCVM |
Notice |
|
Host/SCVM |
Info |
The system performs corresponding actions when the ACOS cluster detects and reports the physical disk failures described above. You can follow the recommendations below as the next steps:
Software RAID failure
When a Software RAID failure occurs on a physical disk, the disk no longer handles read or write operations for the operating system partition or metadata partition. These operations are instead handled by another physical disk in the software RAID group, which results in insufficient redundancy. You can contact technical support for troubleshooting.
Unhealthy disks or failing HDDs
The system automatically isolates the problematic disk and initiates data recovery. Throughout the process, including when the isolation is started, in progress, and completed, AOC displays the corresponding information on the Alert page. After the system displays a prompt indicating that the isolation is complete and this disk can be safely removed, you can proceed to remove this disk. For details, refer to Replacing a physical disk.
Physical disk with I/O blocking
The system automatically takes the physical disk with I/O blocking offline and initiates data recovery. Once taken offline, the disk stops handling any I/O requests. The offline physical disk cannot be unmounted via the AOC interface until its offline status is cleared. You can contact technical support for troubleshooting.
Failing SSD; physical disk with short lifespan; physical disk failing the S.M.A.R.T. test
The system does not automatically take the problematic physical disk offline. Refer to Replacing a physical disk to manually unmount the physical disk via the AOC interface before you replace and install a new disk on the host.