Upgrading physical disk firmware

When upgrading the firmware of a physical disk, perform the procedures below accordingly, based on whether a node restart is required by the upgrade solution provided by the hardware vendor, and whether the disk to be upgraded is a system disk (including the Arcfra system disk):

If the node restart is required (e.g., LiveOS is provided or a USB drive is used for the upgrade), refer to Upgrading physical disk firmware offline.
If the node restart is not required:
- If the physical disk whose firmware is to be upgraded is a system disk, refer to Upgrading system disk firmware online.
- If the physical disk whose firmware is to be upgraded is not a system disk, refer to Upgrading non-system disk firmware online.
- If the physical disks whose firmware is to be upgraded include both system and non-system disks, refer to Upgrading system disk firmware online.

Upgrading physical disk firmware offline

Log in to AOC and set the node whose disk firmware is to be upgraded to maintenance mode.
Shut down the node in AOC.
Upgrade the firmware. After confirming that the firmware version has been updated to the target version, start the server.
Log in to AOC and exit maintenance mode for this node.
Log in to this node using SSH and run the following command to check whether all services are running properly.

sudo /usr/share/tuna/script/control_all_services.sh --action=status --group=role
On any node in the cluster, run the following command to check whether data recovery is complete:

sudo zbs-meta pextent find need_recover

If the message No PExtents found. is returned, the data recovery is complete. Otherwise, wait for a while and check again.

Upgrading system disk firmware online

Log in to AOC and set the node whose disk firmware is to be upgraded to maintenance mode.
On the command-line terminal of the node, run the following command to stop the Chunk service:

sudo systemctl stop zbs-chunkd

Then run the following command to verify that the Chunk service has been stopped:

sudo systemctl status zbs-chunkd

The output below indicates that the Chunk service is stopped. Ensure that the value of Active is inactive (dead):

  ● zbs-chunkd.service - Chunk service
     Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/zbs-chunkd.service.d
             └─cgroup.conf, delegate.conf
     Active: inactive (dead) since Fri 2024-12-20 12:08:24 CST; 42s ago
    Process: 180462 ExecStart=/usr/share/zbs/bin/zbs_run_service.sh zbs/others /usr/sbin/zbs-chunkd --foreground (code=exited, status=0/SUCCESS)
   Main PID: 180462 (code=exited, status=0/SUCCESS)
     Status: "Starting event loop..."

Information:

At this point, AOC will display an alert indicating that the storage service health status is abnormal, which is an expected behavior.

Upgrade the firmware. After the upgrade is complete, confirm that the firmware version has been updated to the target version.
On the command-line terminal of the node, run the following command to start the Chunk service:

sudo systemctl start zbs-chunkd

Then run the following command to verify that the Chunk service has been started as expected:

sudo systemctl status zbs-chunkd

The output below indicates that the Chunk service has started as expected. Ensure that the value of Active is active (running):

    ● zbs-chunkd.service - Chunk service
       Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/zbs-chunkd.service.d
               └─cgroup.conf, delegate.conf
       Active: active (running) since Fri 2024-12-20 12:45:12 CST; 3s ago
      Process: 132708 ExecStartPre=/usr/share/zbs/bin/zbs_config_rdma_qos.sh dscp $CHUNK_SERVER_ACCESS_QOS_MODE (code=exited, status=0/SUCCESS)
     Main PID: 132712 (zbs-chunkd)
       Status: "Starting event loop..."
        Tasks: 21
       Memory: 112.0M
       CGroup: /smtx.slice/smtx-zbs.slice/smtx-zbs-chunkd.slice/zbs-chunkd.service
               └─132712 /usr/sbin/zbs-chunkd --foreground

Wait until the alert indicating that the storage service health status is abnormal in AOC clears.
Exit maintenance mode for this node in AOC.
On any node in the cluster, run the following command to check whether data recovery is complete:

sudo zbs-meta pextent find need_recover

If the message No PExtents found. is returned, the data recovery is complete. Otherwise, wait for a while and check again.

Upgrading non-system disk firmware online

On the command-line terminal of the node where the physical disk firmware needs to be upgraded, run the following command to put the node into storage maintenance mode:

zbs-meta chunk set_maintenance <cid> true [--expire_duration_s <EXPIRE_DURATION_S>]

<cid>: Replace it with the actual Chunk ID.
[--expire_duration_s <EXPIRE_DURATION_S>]: An optional parameter, which refers to the expiration time of the maintenance mode. If not specified, it defaults to a maximum of 43200 seconds (12 hours).

If the output displays that both the ID and IP match the host whose disk firmware is to be upgraded, and the value of Maintenance Mode is True, the host has successfully entered storage maintenance mode. Wait for 10 seconds after the storage maintenance mode is enabled.

On the command-line terminal of the node, run the following command to stop the Chunk service:

sudo systemctl stop zbs-chunkd

Then run the following command to verify that the Chunk service has been stopped:

sudo systemctl status zbs-chunkd

The output below indicates that the Chunk service is stopped. Ensure that the value of Active is inactive (dead):

  ● zbs-chunkd.service - Chunk service
     Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/zbs-chunkd.service.d
             └─cgroup.conf, delegate.conf
     Active: inactive (dead) since Fri 2024-12-20 12:08:24 CST; 42s ago
    Process: 180462 ExecStart=/usr/share/zbs/bin/zbs_run_service.sh zbs/others /usr/sbin/zbs-chunkd --foreground (code=exited, status=0/SUCCESS)
   Main PID: 180462 (code=exited, status=0/SUCCESS)
     Status: "Starting event loop..."

Information:

At this point, AOC will display an alert indicating that the storage service health status is abnormal, which is an expected behavior.

Upgrade the firmware. After the upgrade is complete, confirm that the firmware version has been updated to the target version.
On the command-line terminal of the node, run the following command to start the Chunk service:

sudo systemctl start zbs-chunkd

Then run the following command to verify that the Chunk service has been started as expected:

sudo systemctl status zbs-chunkd

The output below indicates that the Chunk service has started as expected. Ensure that the value of Active is active (running):

    ● zbs-chunkd.service - Chunk service
       Loaded: loaded (/usr/lib/systemd/system/zbs-chunkd.service; enabled; vendor preset: disabled)
      Drop-In: /etc/systemd/system/zbs-chunkd.service.d
               └─cgroup.conf, delegate.conf
       Active: active (running) since Fri 2024-12-20 12:45:12 CST; 3s ago
      Process: 132708 ExecStartPre=/usr/share/zbs/bin/zbs_config_rdma_qos.sh dscp $CHUNK_SERVER_ACCESS_QOS_MODE (code=exited, status=0/SUCCESS)
     Main PID: 132712 (zbs-chunkd)
       Status: "Starting event loop..."
        Tasks: 21
       Memory: 112.0M
       CGroup: /smtx.slice/smtx-zbs.slice/smtx-zbs-chunkd.slice/zbs-chunkd.service
               └─132712 /usr/sbin/zbs-chunkd --foreground

Wait until the alert indicating that the storage service health status is abnormal in AOC clears.

Run the following command on the command-line terminal of the node to exit maintenance mode for the node:

zbs-meta chunk set_maintenance <cid> false

Replace <cid> with the actual Chunk ID.

On any node in the cluster, run the following command to check whether data recovery is complete:

sudo zbs-meta pextent find need_recover

If the message No PExtents found. is returned, the data recovery is complete. Otherwise, wait for a while and check again.

In this article

Upgrading physical disk firmware offline
Upgrading system disk firmware online
Upgrading non-system disk firmware online