After configuring flow control on switches and hosts, you can execute the built-in automated test script data_channel_bench.py in the ACOS system file on any SCVM to generate high I/O throughput and verify the flow control configuration.
Verification principle
The following explanation takes a three-node ACOS cluster as an example.
Assume there are three nodes, A, B, and C, in an ACOS cluster. If the flow control has been configured properly on all three nodes, the nodes should exhibit the same behavior, when running the three typical network I/O models described below, as Node B. Node B's network I/O performance is as follows (The -> symbol indicates the data transfer direction):
A -> B -> C: Node B receives data from Node A while sending data to Node C concurrently. Both the receiving and sending links should operate at full bandwidth.
A <- B -> C: Node B sends data to Node A and C concurrently. The speed of sending links should be half of the full bandwidth.
A -> B <- C: Node B receives data from Node A and C concurrently. The speed of receiving links should be half of the full bandwidth.
When running the automated test script data_channel_bench.py on any SCVM in the cluster, you need to specify the SCVM management IP addresses of all nodes and the network type to be tested. During execution, the script will obtain the RDMA NIC names for each node based on the test network. It will then sequentially run the three network I/O models described above on each node in the cluster to verify whether the flow control configurations for each node are as expected.
Precaution
The I/O flow generated by this automated test script is large and may impact normal business operations. Do not execute this script file and verify the flow control configuration results in the customer's production environment.
You must log in to the SCVM on the node with the arcfra account to run this automated test script.
This automated test script also applies to clusters with more than three nodes. The script automatically takes three nodes as one group during execution and performs the aforementioned tests on each group. The total test time is about round_up(N / 3) × (3 × 3) × 5 seconds, approximately N × 15 seconds, where N is the number of nodes in the cluster.
Verification method
The following shows an example of the flow control test result for the storage network of a three-node ACOS cluster. The bandwidth data are all in MB/s.
Run the test command python /usr/share/zbs/bin/data_channel_bench. py "192.168.57.[85-87]" --mode data on the storage network.
/usr/share/zbs/bin/data_channel_bench. py is the default storage directory of the automated test script on nodes."192.168.57.[85-87]" refers to the management IP addresses of three consecutive nodes. You should replace the IP addresses with the actual management IP addresses. For non-consecutive IP addresses, separate them with spaces and remove the quotation marks, for example, 192.168.57.85 192.168.57.87 192.168.57.89.--mode data indicates that the storage network is set as the test network.When the output shows the following results, with bandwidth data deviation not exceeding 100 MB/s, it indicates that the flow control configuration for the storage network is normal. Otherwise, it indicates a flow control configuration failure, and you need to refer to the Configuring flow control on the switch and Configuring flow control on the ESXi host sections to ensure all configurations have been complete, and then re-verify the flow control configuration results.
$ python /usr/share/zbs/bin/data_channel_bench.py -h
usage: data_channel_bench.py [-h] [--mode MODE] nodes [nodes ...]
positional arguments:
nodes node ips, at least 3 ips should be given. E.g. 192.168.1.1
192.168.1.2 192.168.1.3 or 192.168.1.[1-3]
optional arguments:
-h, --help show this help message and exit
--mode MODE test mode, can be data or access. Default is data.
$ python /usr/share/zbs/bin/data_channel_bench.py "192.168.57.[85-87]" --mode data
================================================================================
[Node Info]
Node ID: 0, IP: 192.168.57.85, IBDev: rocexb8cef6030013125c, Polling CPU: set([8, 2, 3, 6]), Total CPU: 64
Node ID: 1, IP: 192.168.57.86, IBDev: rocexb8cef603001313d4, Polling CPU: set([8, 2, 3, 6]), Total CPU: 64
Node ID: 2, IP: 192.168.57.87, IBDev: rocex043f720300ffb1d8, Polling CPU: set([8, 2, 3, 6]), Total CPU: 64
--------------------------------------------------------------------------------
[Run two way test: 0 1 2]
--------------------------------------------------------------------------------
1 -> 0: 1391.75 2 -> 0: 1392
0 -> 1: 1411.5 0 -> 2: 1406
1 -> 0: 2732 0 -> 2: 2722.75
--------------------------------------------------------------------------------
0 -> 1: 1367.5 2 -> 1: 1363
1 -> 0: 1412.25 1 -> 2: 1391.5
0 -> 1: 2727 1 -> 2: 2737.25
--------------------------------------------------------------------------------
0 -> 2: 1370 1 -> 2: 1356.75
2 -> 0: 1406 2 -> 1: 1395.75
0 -> 2: 2711.25 2 -> 1: 2720.25