Procedure
Run the following command on a cluster node to view performance information for all volumes:
zbs-perf-tools volume list [--chunk-addr <ip>] [--sort_by <sort_by>] [-A]
| Parameter | Description |
|---|---|
--chunk-addr <ip> | The zbs-chunkd RPC server address. Default: 127.0.0.1:10200, the chunk on the node where the command is run. |
--sort_by <sort_by> | Sorts all volume performance information in descending or ascending order based on this field. Only fields related to iops, bw, or latency can be specified. Default: total_iops. |
-A, --ascending | Sorts all volume performance information in ascending order by the sort_by field. If not specified, results will be sorted in descending order by default. |
Output example
$ zbs-perf-tools volume list
---------------------------------------------------------------------
volume_id dc656bde-8095-4a58-938b-00018a951190
read_iops 601.00
read_avgrq 262.14 KB(256.00 KiB)
read_bw 157.55 MB/s(150.25 MiB/s)
read_latency 242.12 US
splited_read_iops 601.00
splited_read_latency 208.46 US
splited_local_read_ratio 1.00 (601.00 / 601.00)
splited_local_read_bw 157.55 MB/s(150.25 MiB/s)
splited_local_read_latency 208.46 US
write_iops 0.00
write_avgrq 0.00 B(0.00 B)
write_bw 0.00 B/s(0.00 B/s)
write_latency 0.00 NS
splited_write_iops 0.00
splited_write_latency 0.00 NS
splited_local_write_ratio 0.00
splited_local_write_bw 0.00 B/s(0.00 B/s)
splited_local_write_latency 0.00 NS
total_iops 601.00
total_avgrq 262.14 KB(256.00 KiB)
total_bw 157.55 MB/s(150.25 MiB/s)
total_latency 242.12 US
total_iop30s 18011.00
unmap_iops 0.00
unmap_total 0
unmap_unaligned_iops 0.00
unmap_unaligned_total 0
---------------------------------------------------------------------
---------------------------------------------------------------------
volume_id ae12b673-8bca-4118-a9b3-45db8f60f945
read_iops 0.00
read_avgrq 0.00 B(0.00 B)
read_bw 0.00 B/s(0.00 B/s)
read_latency 0.00 NS
splited_read_iops 0.00
splited_read_latency 0.00 NS
splited_local_read_ratio 0.00
splited_local_read_bw 0.00 B/s(0.00 B/s)
splited_local_read_latency 0.00 NS
write_iops 601.00
write_avgrq 262.14 KB(256.00 KiB)
write_bw 157.55 MB/s(150.25 MiB/s)
write_latency 108.41 US
splited_write_iops 601.00
splited_write_latency 76.36 US
splited_local_write_ratio 1.00 (601.00 / 601.00)
splited_local_write_bw 157.55 MB/s(150.25 MiB/s)
splited_local_write_latency 76.36 US
total_iops 601.00
total_avgrq 262.14 KB(256.00 KiB)
total_bw 157.55 MB/s(150.25 MiB/s)
total_latency 108.41 US
total_iop30s 18008.00
unmap_iops 0.00
unmap_total 0
unmap_unaligned_iops 0.00
unmap_unaligned_total 0
---------------------------------------------------------------------Output note
| Parameter | Description |
|---|---|
read_iops | The read IOPS in the last 1 second. |
read_avgrq | The average read request size in the last 1 second. |
read_bw | The read bandwidth in the last 1 second. |
read_latency | The average read request latency in the last 1 second. |
splited_read_iops | The split read IOPS in the last 1 second. When the stripe size is 256 KiB, a 512 KiB read request will be split into two 256 KiB requests sent to access. |
splited_read_latency | The average latency of split read requests in the last 1 second. |
splited_local_read_ratio | The IOPS ratio of split read requests served by local access in the last 1 second. |
splited_local_read_bw | The read bandwidth of split requests served by local access in the last 1 second. |
splited_local_read_latency | The average latency of split read requests served by local access in the last 1 second. |
write_iops | The write IOPS in the last 1 second. |
write_avgrq | The average write request size in the last 1 second. |
write_bw | The write bandwidth in the last 1 second. |
write_latency | The average write request latency in the last 1 second. |
splited_write_iops | The split write IOPS in the last 1 second. When the stripe size is 256 KiB, a 512 KiB write request will be split into two 256 KiB requests sent to access. |
splited_write_latency | The average latency of split write requests in the last 1 second. |
splited_local_write_ratio | The IOPS ratio of split write requests served by local access in the last 1 second. |
splited_local_write_bw | The write bandwidth of split requests served by local access in the last 1 second. |
splited_local_write_latency | The average latency of split write requests served by local access in the last 1 second. |
total_iops | The total IOPS in the last 1 second. |
total_avgrq | The average request size in the last 1 second. |
total_bw | The total bandwidth in the last 1 second. |
total_latency | The average latency in the last 1 second. |
total_iop30s | The total I/Os in the last 30 seconds. |
unmap_iops | The UNMAP I/Os in the last 1 second. |
unmap_total | The total UNMAP I/Os. |
unmap_unaligned_iops | The unaligned UNMAP I/Os in the last 1 second. |
unmap_unaligned_total | The total unaligned UNMAP I/Os. |
Procedure
Run the following command on a cluster node to view performance information for the volume with the specified ID:
zbs-perf-tools volume show <volume id> [--chunk-addr <ip>] [-L] [-A]
| Parameter | Description |
|---|---|
volume_id | The volume ID. |
--chunk-addr <ip> | The zbs-chunkd RPC server address. Default: 127.0.0.1:10200, the chunk on the node where the command is run. |
-L | Displays data only from the local chunk server. |
-A | Displays all properties of the chart. |
Output example
$zbs-perf-tools volume show e5a1d376-7d14-4c44-82b4-f2bc2a2334ee
Aggregated Data:
---------------------------------------------------------------------
volume_id f618b4b1-c0c9-4b93-8dc1-9ffdcd086679
read_iops 0.00
read_avgrq 0.00 B(0.00 B)
read_bw 0.00 B/s(0.00 B/s)
read_latency 0.00 NS
splited_read_iops 0.00
splited_read_latency 0.00 NS
splited_local_read_ratio 0.00
splited_local_read_bw 0.00 B/s(0.00 B/s)
splited_local_read_latency 0.00 NS
write_iops 0.00
write_avgrq 0.00 B(0.00 B)
write_bw 0.00 B/s(0.00 B/s)
write_latency 0.00 NS
splited_write_iops 0.00
splited_write_latency 0.00 NS
splited_local_write_ratio 0.00
splited_local_write_bw 0.00 B/s(0.00 B/s)
splited_local_write_latency 0.00 NS
total_iops 0.00
total_avgrq 0.00 B(0.00 B)
total_bw 0.00 B/s(0.00 B/s)
total_latency 0.00 NS
total_iop30s 0.00
unmap_iops 0.00
unmap_total 0
unmap_unaligned_iops 0.00
unmap_unaligned_total 0
---------------------------------------------------------------------
chunk-Specific Data:
--------------------------------------------------------------------------------
CHUNK IP TOTAL IOPS TOTAL AVGRQ TOTAL BW TOTAL LATENCY
--------------------------------------------------------------------------------
10.213.141.86 0.00 0.00 B(0.00 B) 0.00 B/s(0.00 B/s) 0.00 NS
10.213.141.88 0.00 0.00 B(0.00 B) 0.00 B/s(0.00 B/s) 0.00 NS
10.213.141.87 0.00 0.00 B(0.00 B) 0.00 B/s(0.00 B/s) 0.00 NS
10.213.141.89 0.00 0.00 B(0.00 B) 0.00 B/s(0.00 B/s) 0.00 NS
--------------------------------------------------------------------------------Output note
chunk-Specific Data is the data at the chunk level, collected from each chunk service in the cluster.
Aggregated Data is the aggregated performance data collected from all chunk services in the cluster, representing overall metrics. Latency and average values are weighted averages; counts are summed.
| Parameter | Description |
|---|---|
read_iops | The read IOPS in the last 1 second. |
read_avgrq | The average read request size in the last 1 second. |
read_bw | The read bandwidth in the last 1 second. |
read_latency | The average read request latency in the last 1 second. |
splited_read_iops | The split read IOPS in the last 1 second. When the stripe size is 256 KiB, a 512 KiB read request will be split into two 256 KiB requests sent to access. |
splited_read_latency | The average latency of split read requests in the last 1 second. |
splited_local_read_ratio | The IOPS ratio of split read requests served by local access in the last 1 second. |
splited_local_read_bw | The read bandwidth of split requests served by local access in the last 1 second. |
splited_local_read_latency | The average latency of split read requests served by local access in the last 1 second. |
write_iops | The write IOPS in the last 1 second. |
write_avgrq | The average write request size in the last 1 second. |
write_bw | The write bandwidth in the last 1 second. |
write_latency | The average write request latency in the last 1 second. |
splited_write_iops | The split write IOPS in the last 1 second. When the stripe size is 256 KiB, a 512 KiB write request will be split into two 256 KiB requests sent to access. |
splited_write_latency | The average latency of split write requests in the last 1 second. |
splited_local_write_ratio | The IOPS ratio of split write requests served by local access in the last 1 second. |
splited_local_write_bw | The write bandwidth of split requests served by local access in the last 1 second. |
splited_local_write_latency | The average latency of split write requests served by local access in the last 1 second. |
total_iops | The total IOPS in the last 1 second. |
total_avgrq | The average request size in the last 1 second. |
total_bw | The total bandwidth in the last 1 second. |
total_latency | The average latency in the last 1 second. |
total_iop30s | The total I/Os in the last 30 seconds. |
unmap_iops | The UNMAP I/Os in the last 1 second. |
unmap_total | The total UNMAP I/Os. |
unmap_unaligned_iops | The unaligned UNMAP I/Os in the last 1 second. |
unmap_unaligned_total | The total unaligned UNMAP I/Os. |
By enabling the probe mode for a volume, you can periodically collect the volume's I/O latency distribution, I/O request size distribution, and I/O access heatmap. The results are displayed in histogram form.
Procedure
Run the following command on the cluster node to enable the probe mode. Press Ctrl + C to exit. Upon exit, zbs-chunkd automatically clears metrics for that probe session.
zbs-perf-tools volume probe <volume id> [--chunk-addr <ip>] [--meta-addr <ip>] [--distribution {lat | rqsz | logical_offset}] [--interval <int>] [--readwrite {read | write | readwrite}]
| Parameter | Description |
|---|---|
volume id | The ABS volume ID. |
--chunk-addr <ip> | The zbs-chunkd RPC server address. Default: 127.0.0.1:10200, the chunk on the node where the command is run. |
--meta-addr <ip> | The zbs-meta RPC server address. Optional. Default: 127.0.0.1:10206. |
--distribution | The I/O distribution type. Default: lat. Options: lat (I/O latency distribution), rqsz (I/O request size distribution), logical_offset (logical region access heatmap). |
--interval <int> | The probe interval (time window for collecting distribution data). The unit: seconds. Default: 1 second. |
--readwrite | The I/O type to probe. Valid values: read, write, readwrite. Default: readwrite. |
Output example
If a distribution bin contains no values, it will be omitted from the output. For example, if all latency values fall within the [0, 64.00 us) bin and all other bins are empty, only the [0, 64.00 us) bin will be shown.
Probe the latency distribution of read or write requests on a volume
The output histogram has three columns: the first column is the latency bin (unit: us), the second column is the count for the bin, and the third column shows the distribution.
$zbs-perf-tools volume probe e5a1d376-7d144c44-82b4-f2bc2a2334ee --distribution lat
readwrite lat(us) : count distribution
[0,64.00) : 57 |* |
[64.00,128.00) : 6172 |********** |
[128.00,256.00) : 15473 |************************* |
[256.00,512.00) : 20443 |********************************|
[512.00,1024.00) : 10467 |***************** |
[1024.00,2048.00) : 1361 |*** |
[2048.00,4096.00) : 337 |* |
Probe the I/O request size distribution of read or write requests on a volume
The output histogram has three columns: the first column is the I/O request size bin (unit: KB), the second column is the count for the bin, and the third column shows the distribution.
$zbs-perf-tools volume probe e5a1d376-7d14-4c44-82b4-f2bc2a2334ee --distribution rqsz
readwrite size(KB) : count distribution
[4.00,8.00) : 71 |********************************|
[16.00,32.00) : 58 |*************************** |
[256.00,512.00) : 3 |** |Probe the hot region distribution of read or write operations on a volume
The output histogram has three columns: the first column is the I/O read or write request bin (1 GB per bin), the second column is the count for the bin, and the third column shows the distribution.
$zbs-perf-tools volume probe e5a1d376-7d14-4c44-82b4-f2bc2a2334ee --distribution logical_offset
readwrite logical offset(GB) : count distribution
[0,1.00) : 29248 |********************************|
[1.00,2.00) : 16579 |******************* |
[2.00,3.00) : 16763 |******************* |
[3.00,4.00) : 16320 |****************** |Procedure
Run the following command on any node in the cluster to create a session named after the volume ID and start collecting data:
zbs-perf-tools trace volume <volume_id> --trace_time <time>
| Parameter | Description |
|---|---|
volume_id | The target volume ID. |
--trace_time <time> | The duration of data collection. Default: 10s. Maximum: 600s. |
Press Ctrl + C to stop data collection. The tool will then automatically perform data analysis.
The analysis results include a statistics table and statistics charts. The statistics table is displayed directly in the console. The statistics charts are saved as static HTML files in the trace data directory on each node. You can download the files and view them in a Web browser.
Output example
$ zbs-perf-tools trace volume 5be99682-cdc9-4516-b262-3bfe44379207
session started...
Trace time is over, sending interrupt signal...
stopping tracing...
/root/zbs-trace/5be99682-cdc9-4516-b262-3bfe44379207/trace-data/cid-1-10-0-18-31 parse succeed. wrote to /root/zbs-trace/5be99682-cdc9-4516-b262-3bfe44379207/trace-data/cid-1-10-0-18-31/2023-07-28-165212+0800/parsed_data
/root/zbs-trace/5be99682-cdc9-4516-b262-3bfe44379207/trace-data/cid-2-10-0-18-34 parse succeed. wrote to /root/zbs-trace/5be99682-cdc9-4516-b262-3bfe44379207/trace-data/cid-2-10-0-18-34/2023-07-28-165211+0800/parsed_data
/root/zbs-trace/5be99682-cdc9-4516-b262-3bfe44379207/trace-data/cid-4-10-0-18-32 parse succeed. wrote to /root/zbs-trace/5be99682-cdc9-4516-b262-3bfe44379207/trace-data/cid-4-10-0-18-32/2023-07-28-165212+0800/parsed_data
The report of directory: /root/zbs-trace/5be99682-cdc9-4516-b262-3bfe44379207/trace-data/cid-1-10-0-18-31/2023-07-28-165212+0800/parsed_data
ACCESS
------------------------------------------------------------------------
AVG P50 P95 P99 MAX N
------------------------------------------------------------------------
read 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
write 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
readwrite 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
sync_gen 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
wait_recover 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
caw 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
replica_io_read 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
replica_io_write 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
replica_io_readwrite 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0.00 NS 0
------------------------------------------------------------------------
……