When upgrading the firmware of a physical disk, perform the procedures below accordingly, based on whether a node restart is required by the upgrade solution provided by the hardware vendor, and whether the disk to be upgraded is a system disk (including the Arcfra system disk):
Log in to AOC and set the node whose disk firmware is to be upgraded to maintenance mode.
Shut down the node in AOC.
Upgrade the firmware. After confirming that the firmware version has been updated to the target version, start the server.
Log in to AOC and exit maintenance mode for this node.
Log in to this node using SSH and run the following command to check whether all services are running properly.
sudo /usr/share/tuna/script/control_all_services.sh --action=status --group=role
On any node in the cluster, run the following command to check whether data recovery is complete:
sudo zbs-meta pextent find need_recover
If the message No PExtents found. is returned, the data recovery is complete. Otherwise, wait for a while and check again.
Log in to AOC and set the node whose disk firmware is to be upgraded to maintenance mode.
On the command-line terminal of the node, run the following command to stop the Chunk service:
sudo systemctl stop zbs-chunkd
Then run the following command to verify that the Chunk service has been stopped:
sudo systemctl status zbs-chunkd
The output below indicates that the Chunk service is stopped. Ensure that the value of Active is inactive (dead):
● zbs-chunkd.service - Chunk service
Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/zbs-chunkd.service.d
└─cgroup.conf, delegate.conf
Active: inactive (dead) since Fri 2024-12-20 12:08:24 CST; 42s ago
Process: 180462 ExecStart=/usr/share/zbs/bin/zbs_run_service.sh zbs/others /usr/sbin/zbs-chunkd --foreground (code=exited, status=0/SUCCESS)
Main PID: 180462 (code=exited, status=0/SUCCESS)
Status: "Starting event loop..."Information:
At this point, AOC will display an alert indicating that the storage service health status is abnormal, which is an expected behavior.
Upgrade the firmware. After the upgrade is complete, confirm that the firmware version has been updated to the target version.
On the command-line terminal of the node, run the following command to start the Chunk service:
sudo systemctl start zbs-chunkd
Then run the following command to verify that the Chunk service has been started as expected:
sudo systemctl status zbs-chunkd
The output below indicates that the Chunk service has started as expected. Ensure that the value of Active is active (running):
● zbs-chunkd.service - Chunk service
Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/zbs-chunkd.service.d
└─cgroup.conf, delegate.conf
Active: active (running) since Fri 2024-12-20 12:45:12 CST; 3s ago
Process: 132708 ExecStartPre=/usr/share/zbs/bin/zbs_config_rdma_qos.sh dscp $CHUNK_SERVER_ACCESS_QOS_MODE (code=exited, status=0/SUCCESS)
Main PID: 132712 (zbs-chunkd)
Status: "Starting event loop..."
Tasks: 21
Memory: 112.0M
CGroup: /smtx.slice/smtx-zbs.slice/smtx-zbs-chunkd.slice/zbs-chunkd.service
└─132712 /usr/sbin/zbs-chunkd --foregroundWait until the alert indicating that the storage service health status is abnormal in AOC clears.
Exit maintenance mode for this node in AOC.
On any node in the cluster, run the following command to check whether data recovery is complete:
sudo zbs-meta pextent find need_recover
If the message No PExtents found. is returned, the data recovery is complete. Otherwise, wait for a while and check again.
On the command-line terminal of the node where the physical disk firmware needs to be upgraded, run the following command to put the node into storage maintenance mode:
zbs-meta chunk set_maintenance <cid> true [--expire_duration_s <EXPIRE_DURATION_S>]<cid>: Replace it with the actual Chunk ID.[--expire_duration_s <EXPIRE_DURATION_S>]: An optional parameter, which refers to the expiration time of the maintenance mode. If not specified, it defaults to a maximum of 43200 seconds (12 hours).If the output displays that both the ID and IP match the host whose disk firmware is to be upgraded, and the value of Maintenance Mode is True, the host has successfully entered storage maintenance mode. Wait for 10 seconds after the storage maintenance mode is enabled.
On the command-line terminal of the node, run the following command to stop the Chunk service:
sudo systemctl stop zbs-chunkd
Then run the following command to verify that the Chunk service has been stopped:
sudo systemctl status zbs-chunkd
The output below indicates that the Chunk service is stopped. Ensure that the value of Active is inactive (dead):
● zbs-chunkd.service - Chunk service
Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/zbs-chunkd.service.d
└─cgroup.conf, delegate.conf
Active: inactive (dead) since Fri 2024-12-20 12:08:24 CST; 42s ago
Process: 180462 ExecStart=/usr/share/zbs/bin/zbs_run_service.sh zbs/others /usr/sbin/zbs-chunkd --foreground (code=exited, status=0/SUCCESS)
Main PID: 180462 (code=exited, status=0/SUCCESS)
Status: "Starting event loop..."Information:
At this point, AOC will display an alert indicating that the storage service health status is abnormal, which is an expected behavior.
Upgrade the firmware. After the upgrade is complete, confirm that the firmware version has been updated to the target version.
On the command-line terminal of the node, run the following command to start the Chunk service:
sudo systemctl start zbs-chunkd
Then run the following command to verify that the Chunk service has been started as expected:
sudo systemctl status zbs-chunkd
The output below indicates that the Chunk service has started as expected. Ensure that the value of Active is active (running):
● zbs-chunkd.service - Chunk service
Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/zbs-chunkd.service.d
└─cgroup.conf, delegate.conf
Active: active (running) since Fri 2024-12-20 12:45:12 CST; 3s ago
Process: 132708 ExecStartPre=/usr/share/zbs/bin/zbs_config_rdma_qos.sh dscp $CHUNK_SERVER_ACCESS_QOS_MODE (code=exited, status=0/SUCCESS)
Main PID: 132712 (zbs-chunkd)
Status: "Starting event loop..."
Tasks: 21
Memory: 112.0M
CGroup: /smtx.slice/smtx-zbs.slice/smtx-zbs-chunkd.slice/zbs-chunkd.service
└─132712 /usr/sbin/zbs-chunkd --foregroundWait until the alert indicating that the storage service health status is abnormal in AOC clears.
Run the following command on the command-line terminal of the node to exit maintenance mode for the node:
zbs-meta chunk set_maintenance <cid> falseReplace <cid> with the actual Chunk ID.
On any node in the cluster, run the following command to check whether data recovery is complete:
sudo zbs-meta pextent find need_recover
If the message No PExtents found. is returned, the data recovery is complete. Otherwise, wait for a while and check again.