Search Docs...
⌘ K
OverviewDeploymentManagementOperationReferenceGlossary

Entering maintenance mode

The host is allowed to enter maintenance mode only after the system performs a precheck to confirm that the status of the cluster and host meets the requirements.

Precaution

If maintenance mode is enabled for a host, you cannot perform any scheduling operations on the virtual machines hosted by it, including starting a virtual machine, creating a virtual machine, migrating a virtual machine, or rebuilding a virtual machine on this host via HA.

Procedure

  1. In the AOC host list, click the ellipsis (...) on the right of the target host or SCVM, and select Enter maintenance mode.

  2. In the pop-up Enter maintenance mode dialog box, the system performs a precheck as follows. If some of the check results do not meet the requirements, follow the suggested operations to make manual adjustments, and then try entering maintenance mode again.

    Precheck item Suggested operation upon failure
    No business virtual machines on the host need to be evacuated, or the virtual machines to be evacuated are not associated with enabled placement group rules of the Must type.
    • If you select Ignore rule, the system will ignore the placement group rules when migrating virtual machines. In this case, the cluster may generate an alert about virtual machines not complying with placement group rules, which is an expected behavior.
    • If you select Follow rule, the system will migrate virtual machines in compliance with the placement group rules.
    (AVE only) The host has no running virtual machines, or all running virtual machines can be evacuated.
    • Manually adjust cluster resources.
    • Shut down the Running virtual machines that cannot be evacuated.
    • Select Shut down VMs that cannot be migrated.
    (AVE only) The host has no virtual machines with HA enabled, or all HA-enabled virtual machines can be migrated.
    • Manually adjust cluster resources.
    • Manually evacuate the virtual machines.
    (AVE only) The host has no Suspended or Unknown virtual machines. Manually adjust the virtual machines.
    (AVE only) The number of virtual machines (including those in the recycle bin) on the host does not exceed 150. Evacuate some virtual machines from the host.
    The host has no data with only one replica.
    • Adjust the replication policy.
    • Wait until the data recovery is complete.
    The storage network connection of the host is normal. Fix the faults.
    The cluster has no host or SCVM in the Entering maintenance mode or Maintenance mode state. Wait for the other host or SCVM to complete the maintenance task and exit maintenance mode.
    The cluster has no data recovery in progress. Wait for the data recovery to complete.
    There is sufficient free storage space in other nodes in the cluster, that is, the following two conditions are met:
    • The available storage capacity on the remaining nodes in the cluster is more than the used storage capacity on the current node.
    • The available write cache capacity on the remaining nodes in the cluster is more than the used write cache capacity on the current node.
    • Expand the cluster.
    • Delete some snapshots or virtual machines to release cluster capacity.
    • If it is ensured that the offline maintenance can be completed within 12 hours, entering maintenance mode is allowed after confirming the risk.
    The following requirements for services are all met:
    • Except for the current node, there are no exceptions in the zookeeper or mongodb service in the cluster;
    • Among other nodes in the cluster, there is a node where the zbs-meta service is running properly;
    • If there are virtual volumes or NFS exports in the cluster whose redundancy policy is erasure coding, except for the current node, the number of nodes that are with a healthy Chunk service and not being removed is not less than the total number of data blocks (K) and check blocks (M) configured for erasure coding of any virtual volume or NFS export;
    • The job-center-worker service on the current node is running properly;
    • The libvirt service on the current node is running properly;
    Fix the faults.

    Note:

    • If the host has virtual machines with DRS automatic migration enabled, the virtual machines may be relocated during host maintenance mode. As a result, the virtual machines cannot be automatically migrated back to the original host after maintenance mode is exited. Click Edit DRS setting to go to the dynamic resource scheduler page and adjust the DRS automatic migration settings as needed. For details, refer to the Dynamic resource scheduler configuration section.

    • If the ACOS (AVE) cluster has a file storage cluster deployed and the file controller is located on the host, click the file controller name to go to the file controller page and take it offline. For file storage cluster version 1.2.1 or later, refer to the chapter on taking the file controller offline . For 1.1.2 or earlier versions , contact technical support for assistance.

    • If the ACOS (AVE) cluster has an ANS service or AKE service deployed, and the system service virtual machines are located on the host, these system service virtual machines will automatically ignore the enabled placement group rules of the Must type during maintenance mode migration. This migration will not impact business operations.

    • If a virtual machine fails to be migrated due to insufficient CPU reservation on the target host, adjust the CPU reservation for this virtual machine based on the available resources of the target host.

  3. If all check items meet the requirements, click Enter maintenance mode in the lower right corner of the dialog box to place the host in maintenance mode.

  4. If you need to enable ESXi host maintenance mode in an ACOS (VMware ESXi) cluster, refer to the official VMware documentation for instructions.