Viewing physical disk information

Viewing physical disk information under the specified path

Procedure

Run the following command on a cluster node to obtain the physical disk information under the specified path:

zbs-chunk query disk <path>

Output example

{ 'exist': True,
  'formatted': False,
  'in_use': False,
  'instance_id': 0}
Query Success

Output note

Parameter	Description
`exist`	Whether the physical disk exists.
`formatted`	Whether the physical disk is formatted by a chunk.
`in_use`	Whether the physical disk is used by a chunk.

Querying physical disk health records

Use the disk name or serial number of the physical disk to query its health details.

Procedure

Querying by physical disk name

On the node where the physical disk is located, run the following command to query the health details of the physical disk:

zbs-node show_disk_status [-p] [/dev/]<disk_name> [--with_rawdata]

Here, disk_name indicates the device name of the physical disk. If the disk is detected and marked as a high-latency disk, the --with_rawdata option can be used to output the disk's raw disk status data.

Operation example

The following three commands deliver the same result:

zbs-node show_disk_status -p /dev/sdc
zbs-node show_disk_status /dev/sdc
zbs-node show_disk_status sdc

Querying by physical disk serial number

On any node in the cluster, run the following command to query the health details of the physical disk:

zbs-node show_disk_status -s <disk_serial>

Here, disk_serial indicates the serial number of the physical disk.

Operation example

zbs-node show_disk_status -s 9XG6GTFN

Output example

== Base Information ==
is healthy                            : True
device name                           : /dev/sdc
bus type                              : ata
model                                 : ST91000640NS
firmware                              : SN03
disk serial                           : 9XG6GTFN
last belong to                        : 10.0.67.212
== Fault Detection ==
chunk errflag detected                : False
chunk warnflag detected               : False
chunk io error count overflow         : False
chunk checksum error count overflow   : False
iostat latency detected               : False
smart error detected                  : False
software raid faulty detected         : False
offline due to io timeout             : False
offline due to cmd abort              : False
offline due to error queue            : False
reallocated sectors count overflow    : False
== Extra Fault Detection ==
chunk io errors count                 : -
chunk checksum errors count           : -
io latency (ms)                       : -
iops                                  : -
sectors per second                    : -
bandwidth (MiB/s)                     : -
smartctl hang process                 : -
S.M.A.R.T. assessment error           : -
reallocated sectors count             : -
== S.M.A.R.T. Attributes ==
ID#   ATTRIBUTE_NAME            VALUE    THRESH   RAW             CHECK_FIELD     CHECK_THRESH    CHECK_RES
5     Reallocated_Sector_Ct     100      036      1               raw             10              True
187   Reported_Uncorrect        100      000      0               raw             0               True
188   Command_Timeout           100      000      0               value           10              True
194   Temperature_Celsius       018      000      18              raw             45              True
197   Current_Pending_Sector    100      000      0               raw             0               True
198   Offline_Uncorrectable     100      000      0               raw             0               True

Output note

Parameter	Description
`is healthy`	Whether the physical disk is healthy.
`bus type`	The bus interface type, e.g., `ata`, `scsi`.
`model`	The physical disk model.
`firmware`	The firmware version.
`disk path`	The physical disk path.
`disk serial`	The physical disk serial number.
`trace id`	The system identifier for tracking physical disk status (typically the serial number, but can also be NGUID or other information).
`controller`	The driver type used by the physical disk controller.
`last belong to`	The IP address of the last chunk to which the disk belongs.
`chunk errflag detected`	Whether physical disks in an error state are detected.
`chunk warnflag detected`	Whether physical disks in a sub-healthy state are detected.
`chunk io error count overflow`	Whether an I/O error is detected (I/O error count exceeds the threshold).
`chunk checksum error count overflow`	Whether checksum errors are detected by LSM.
`iostat latency detected`	Whether slow disk anomalies are detected through iostat output.
`smart error detected`	Whether an error is detected through SMART information.
`software raid faulty detected`	Whether a software RAID failure is detected.
`offline due to io timeout`	Whether the disk goes offline due to an I/O timeout.
`offline due to cmd abort`	Whether the disk goes offline due to a command abort.
`offline due to error queue`	Whether the disk goes offline due to an error queue.
`reallocated sectors count overflow`	Whether the reallocated sectors count exceeds the threshold.
`Reallocated_Sector_Ct`	The number of reallocated sectors, basically representing the number of failed sectors.
`Reported_Uncorrect`	The uncorrectable error that cannot be corrected by hardware ECC.
`Command_Timeout`	The communication timeout error: failure to connect to the hard disk.
`Temperature_Celsius`	The temperature.
`Current_Pending_Sector`	The current pending sector count, indicating the number of unstable sectors.
`Offline_Uncorrectable`	The count of offline uncorrectable sectors.

Resetting physical disk health status

After resolving a physical disk failure, reset the physical disk health status. Otherwise, it cannot continue to be mounted and used.

Procedure

On the node where the physical disk is located, run the following command to reset the health status of the physical disk:

zbs-node set_disk_healthy [/dev/]<disk_name>

Here, disk_name indicates the disk name of the physical disk.

Output example

2024-06-18 14:07:43,908 node.py 769 [13302] [INFO] set chunk partition healthy.
2024-06-18 14:07:45,591 node.py 781 [13302] [INFO] set tuna disk record healthy.
2024-06-18 14:07:46,770 node.py 787 [13302] [INFO] clean from slow disk record.
2024-06-18 14:07:46,800 node.py 793 [13302] [INFO] Setting disk sdc healthy succeed.

In this article

Viewing physical disk information under the specified path
Querying physical disk health records
Resetting physical disk health status