Search Docs...
⌘ K
OverviewDeploymentManagementOperationReference

Viewing monitoring information

Procedure

  1. In the workload cluster list, click the display name of the target workload cluster to enter the Overview page of the workload cluster.

  2. Click Monitoring in the sidebar to view detailed monitoring information for the workload cluster.

    • Click the calendar icon at the top of the page to set the desired time range for the query.

      • Absolute time range: You can customize the start and end times. Once set, click Apply time range.
      • Quick time range: You can select Last 10 minutes, Last 1 hour, Last 3 hours, Last 24 hours, and Last 3 days.
    • Click the refresh icon at the top right of the page to manually refresh the monitoring information. Click the dropdown menu to the right of the icon to set the automatic refresh frequency.

      • Off: Disable the automatic refresh.
      • 10s/30s/1m/5m: Automatically refresh the monitoring information every 10 seconds, 30 seconds, 1 minute, or 5 minutes.
    • Click the search box at the top of the page to choose different views for monitoring information. Some views allow you to set one or more filter conditions to quickly narrow down the monitoring information you want to view.

Monitoring chart description

Each view contains multiple charts, as shown in the table below.

Note:

You can only view GPU-related charts when the workload cluster has GPU devices assigned. If the cluster has enabled the TimeSlicing feature, GPU-related charts are only supported in the following views:

  • Node Exporter / USE Method / Cluster
  • Node Exporter / USE Method / Node
  • Node Exporter / Nodes
  • Kubernetes / Compute Resource / GPU
View Chart
Kubernetes / API server
  • Availability (30d) > 99.000%
  • ErrorBudget (30d) > 99.000%
  • Read Availability (30d)
  • Read SLI - Requests
  • Read SLI - Errors
  • Read SLI - Duration
  • Write Availability (30d)
  • Write SLI - Requests
  • Write SLI - Errors
  • Write SLI - Duration
  • Work Queue Add Rate
  • Work Queue Depth
  • Work Queue Latency
  • Memory
  • CPU usage
  • Goroutines
  • Notice
Kubernetes / Compute Resources / Cluster
  • CPU Utilisation
  • CPU Requests Commitment
  • CPU Limits Commitment
  • Memory Utilisation
  • Memory Requests Commitment
  • Memory Limits Commitment
  • GPU Utilization
  • GPU Memory Utilization
  • GPU Limits Commitment
  • GPU Nodes
  • Total GPUs
  • CPU Usage
  • CPU Quota
  • Memory Usage (w/o cache)
  • Requests by Namespace
  • GPU Memory Usage
  • GPU Quota
  • Current Network Usage
  • Receive Bandwidth
  • Transmit Bandwidth
  • Average Container Bandwidth by Namespace: Received
  • Average Container Bandwidth by Namespace: Transmitted
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
  • IOPS(Reads+Writes)
  • ThroughPut(Read+Write)
  • Current Storage IO
Kubernetes / Compute Resources / GPU
  • GPU Utilization
  • GPU Power Total
  • GPU Avg. Temp
  • Tensor Core Utilization
  • GPU Framebuffer Mem Used
  • GPU Temperature
  • GPU Power Usage
  • GPU SM Clocks
Kubernetes / Compute Resources / Namespace (Pods)
  • CPU Utilisation (from requests)
  • CPU Utilisation (from limits)
  • Memory Utilisation (from requests)
  • Memory Utilisation (from limits)
  • GPU Utilization
  • GPU Memory Utilization
  • CPU Usage
  • CPU Quota
  • Memory Usage (w/o cache)
  • Memory Quota
  • GPU Memory Usage
  • GPU Quota
  • Current Network Usage
  • Receive Bandwidth
  • Transmit Bandwidth
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
  • IOPS(Reads+Writes)
  • ThroughPut(Read+Write)
  • Current Storage IO
Kubernetes / Compute Resources / Namespace (Workloads)
  • CPU Usage
  • CPU Quota
  • Memory Usage
  • Memory Quota
  • GPU Memory Usage
  • GPU Quota
  • Current Network Usage
  • Receive Bandwidth
  • Transmit Bandwidth
  • Average Container Bandwidth by Workload: Received
  • Average Container Bandwidth by Workload: Transmitted
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
Kubernetes / Compute Resources / Node (Pods)
  • CPU Usage
  • CPU Quota
  • Memory Usage (w/o cache)
  • Memory Quota
  • GPU Memory Usage
  • GPU Quota
Kubernetes / Compute Resources / Pod
  • CPU Usage
  • CPU Throttling
  • CPU Quota
  • Memory Usage (WSS)
  • Memory Quota
  • GPU Memory Usage
  • GPU Quota
  • Receive Bandwidth
  • Transmit Bandwidth
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
  • IOPS
  • ThroughPut
  • IOPS(Reads+Writes)
  • ThroughPut(Read+Write)
  • Current Storage IO
Kubernetes / Compute Resources / Workload
  • CPU Usage
  • CPU Quota
  • Memory Usage
  • Memory Quota
  • GPU Memory Usage
  • GPU Quota
  • Current Network Usage
  • Receive Bandwidth
  • Transmit Bandwidth
  • Average Container Bandwidth by Pod: Received
  • Average Container Bandwidth by Pod: Transmitted
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
Kubernetes / Controller Manager
  • Up
  • Work Queue Add Rate
  • Work Queue Depth
  • Work Queue Latency
  • Kube API Request Rate
  • Post Request Latency 99th Quantile
  • Get Request Latency 99th Quantile
  • Memory
  • CPU usage
  • Goroutines
Kubernetes / Kubelet
  • Running Kubelets
  • Running Pods
  • Running Containers
  • Actual Volume Count
  • Desired Volume Count
  • Config Error Count
  • Operation Rate
  • Operation Error Rate
  • Operation duration 99th quantile
  • Pod Start Rate
  • Pod Start Duration
  • Storage Operation Rate
  • Storage Operation Error Rate
  • Storage Operation Duration 99th quantile
  • Cgroup manager operation rate
  • Cgroup manager 99th quantile
  • PLEG relist rate
  • PLEG relist interval
  • PLEG relist duration
  • RPC Rate
  • Request duration 99th quantile
  • Memory
  • CPU usage
  • Goroutines
Kubernetes / Networking / Cluster
  • Current Rate of Bytes Received
  • Current Rate of Bytes Transmitted
  • Current Status
  • Average Rate of Bytes Received
  • Average Rate of Bytes Transmitted
  • Receive Bandwidth
  • Transmit Bandwidth
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
  • Rate of TCP Retransmits out of all sent segments
  • Rate of TCP SYN Retransmits out of all retransmits
Kubernetes / Networking / Namespace (Pods)
  • Current Rate of Bytes Received
  • Current Rate of Bytes Transmitted
  • Current Status
  • Receive Bandwidth
  • Transmit Bandwidth
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
Kubernetes / Networking / Namespace (Workload)
  • Current Rate of Bytes Received
  • Current Rate of Bytes Transmitted
  • Current Status
  • Average Rate of Bytes Received
  • Average Rate of Bytes Transmitted
  • Receive Bandwidth
  • Transmit Bandwidth
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
Kubernetes / Networking / Pod
  • Current Rate of Bytes Received
  • Current Rate of Bytes Transmitted
  • Receive Bandwidth
  • Transmit Bandwidth
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
Kubernetes / Networking / Workload
  • Current Rate of Bytes Received
  • Current Rate of Bytes Transmitted
  • Average Rate of Bytes Received
  • Average Rate of Bytes Transmitted
  • Receive Bandwidth
  • Transmit Bandwidth
  • Rate of Received Packets
  • Rate of Transmitted Packets
  • Rate of Received Packets Dropped
  • Rate of Transmitted Packets Dropped
Kubernetes / Persistent Volumes
  • Volume Space Usage
  • Volume Space Usage
  • Volume inodes Usage
  • Volume inodes Usage
Kubernetes / Proxy
  • Up
  • Rules Sync Rate
  • Rule Sync Latency 99th Quantile
  • Network Programming Rate
  • Network Programming Latency 99th Quantile
  • Kube API Request Rate
  • Post Request Latency 99th Quantile
  • Get Request Latency 99th Quantile
  • Memory
  • CPU usage
  • Goroutines
Kubernetes / Scheduler
  • Up
  • Scheduling Rate
  • Scheduling latency 99th Quantile
  • Kube API Request Rate
  • Post Request Latency 99th Quantile
  • Get Request Latency 99th Quantile
  • Memory
  • CPU usage
  • Goroutines
Node Exporter / Nodes
  • CPU Usage
  • Load Average
  • Memory Usage
  • Memory Usage
  • GPU Utilization
  • GPU Memory Utilization
  • Disk I/O
  • Disk Space Usage
  • Network Received
  • Network Transmitted
Node Exporter / USE Method / Cluster
  • CPU Utilization
  • CPU Saturation (Load1 per CPU)
  • Memory Utilization
  • Memory Saturation (Major Page Faults)
  • GPU Utilization
  • GPU Memory Utilization
  • Network Utilization (Bytes Receive/Transmit)
  • Network Saturation (Drops Receive/Transmit)
  • Disk IO Utilization
  • Disk IO Saturation
  • Disk Space Utilization
Node Exporter / USE Method / Node
  • CPU Utilization
  • CPU Saturation (Load1 per CPU)
  • Memory Utilization
  • Memory Saturation (Major Page Faults)
  • GPU Utilization
  • GPU Memory Utilization
  • Network Utilization (Bytes Receive/Transmit)
  • Network Saturation (Drops Receive/Transmit)
  • Disk IO Utilization
  • Disk IO Saturation
  • Disk Space Utilization