Kubernetes / API server |
- Availability (30d) > 99.000%
- ErrorBudget (30d) > 99.000%
- Read Availability (30d)
- Read SLI - Requests
- Read SLI - Errors
- Read SLI - Duration
- Write Availability (30d)
- Write SLI - Requests
- Write SLI - Errors
- Write SLI - Duration
- Work Queue Add Rate
- Work Queue Depth
- Work Queue Latency
- Memory
- CPU usage
- Goroutines
- Notice
|
Kubernetes / Compute Resources / Cluster |
- CPU Utilisation
- CPU Requests Commitment
- CPU Limits Commitment
- Memory Utilisation
- Memory Requests Commitment
- Memory Limits Commitment
- GPU Utilization
- GPU Memory Utilization
- GPU Limits Commitment
- GPU Nodes
- Total GPUs
- CPU Usage
- CPU Quota
- Memory Usage (w/o cache)
- Requests by Namespace
- GPU Memory Usage
- GPU Quota
- Current Network Usage
- Receive Bandwidth
- Transmit Bandwidth
- Average Container Bandwidth by Namespace: Received
- Average Container Bandwidth by Namespace: Transmitted
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
- IOPS(Reads+Writes)
- ThroughPut(Read+Write)
- Current Storage IO
|
Kubernetes / Compute Resources / GPU |
- GPU Utilization
- GPU Power Total
- GPU Avg. Temp
- Tensor Core Utilization
- GPU Framebuffer Mem Used
- GPU Temperature
- GPU Power Usage
- GPU SM Clocks
|
Kubernetes / Compute Resources / Namespace (Pods) |
- CPU Utilisation (from requests)
- CPU Utilisation (from limits)
- Memory Utilisation (from requests)
- Memory Utilisation (from limits)
- GPU Utilization
- GPU Memory Utilization
- CPU Usage
- CPU Quota
- Memory Usage (w/o cache)
- Memory Quota
- GPU Memory Usage
- GPU Quota
- Current Network Usage
- Receive Bandwidth
- Transmit Bandwidth
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
- IOPS(Reads+Writes)
- ThroughPut(Read+Write)
- Current Storage IO
|
Kubernetes / Compute Resources / Namespace (Workloads) |
- CPU Usage
- CPU Quota
- Memory Usage
- Memory Quota
- GPU Memory Usage
- GPU Quota
- Current Network Usage
- Receive Bandwidth
- Transmit Bandwidth
- Average Container Bandwidth by Workload: Received
- Average Container Bandwidth by Workload: Transmitted
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
|
Kubernetes / Compute Resources / Node (Pods) |
- CPU Usage
- CPU Quota
- Memory Usage (w/o cache)
- Memory Quota
- GPU Memory Usage
- GPU Quota
|
Kubernetes / Compute Resources / Pod |
- CPU Usage
- CPU Throttling
- CPU Quota
- Memory Usage (WSS)
- Memory Quota
- GPU Memory Usage
- GPU Quota
- Receive Bandwidth
- Transmit Bandwidth
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
- IOPS
- ThroughPut
- IOPS(Reads+Writes)
- ThroughPut(Read+Write)
- Current Storage IO
|
Kubernetes / Compute Resources / Workload |
- CPU Usage
- CPU Quota
- Memory Usage
- Memory Quota
- GPU Memory Usage
- GPU Quota
- Current Network Usage
- Receive Bandwidth
- Transmit Bandwidth
- Average Container Bandwidth by Pod: Received
- Average Container Bandwidth by Pod: Transmitted
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
|
Kubernetes / Controller Manager |
- Up
- Work Queue Add Rate
- Work Queue Depth
- Work Queue Latency
- Kube API Request Rate
- Post Request Latency 99th Quantile
- Get Request Latency 99th Quantile
- Memory
- CPU usage
- Goroutines
|
Kubernetes / Kubelet |
- Running Kubelets
- Running Pods
- Running Containers
- Actual Volume Count
- Desired Volume Count
- Config Error Count
- Operation Rate
- Operation Error Rate
- Operation duration 99th quantile
- Pod Start Rate
- Pod Start Duration
- Storage Operation Rate
- Storage Operation Error Rate
- Storage Operation Duration 99th quantile
- Cgroup manager operation rate
- Cgroup manager 99th quantile
- PLEG relist rate
- PLEG relist interval
- PLEG relist duration
- RPC Rate
- Request duration 99th quantile
- Memory
- CPU usage
- Goroutines
|
Kubernetes / Networking / Cluster |
- Current Rate of Bytes Received
- Current Rate of Bytes Transmitted
- Current Status
- Average Rate of Bytes Received
- Average Rate of Bytes Transmitted
- Receive Bandwidth
- Transmit Bandwidth
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
- Rate of TCP Retransmits out of all sent segments
- Rate of TCP SYN Retransmits out of all retransmits
|
Kubernetes / Networking / Namespace (Pods) |
- Current Rate of Bytes Received
- Current Rate of Bytes Transmitted
- Current Status
- Receive Bandwidth
- Transmit Bandwidth
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
|
Kubernetes / Networking / Namespace (Workload) |
- Current Rate of Bytes Received
- Current Rate of Bytes Transmitted
- Current Status
- Average Rate of Bytes Received
- Average Rate of Bytes Transmitted
- Receive Bandwidth
- Transmit Bandwidth
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
|
Kubernetes / Networking / Pod |
- Current Rate of Bytes Received
- Current Rate of Bytes Transmitted
- Receive Bandwidth
- Transmit Bandwidth
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
|
Kubernetes / Networking / Workload |
- Current Rate of Bytes Received
- Current Rate of Bytes Transmitted
- Average Rate of Bytes Received
- Average Rate of Bytes Transmitted
- Receive Bandwidth
- Transmit Bandwidth
- Rate of Received Packets
- Rate of Transmitted Packets
- Rate of Received Packets Dropped
- Rate of Transmitted Packets Dropped
|
Kubernetes / Persistent Volumes |
- Volume Space Usage
- Volume Space Usage
- Volume inodes Usage
- Volume inodes Usage
|
Kubernetes / Proxy |
- Up
- Rules Sync Rate
- Rule Sync Latency 99th Quantile
- Network Programming Rate
- Network Programming Latency 99th Quantile
- Kube API Request Rate
- Post Request Latency 99th Quantile
- Get Request Latency 99th Quantile
- Memory
- CPU usage
- Goroutines
|
Kubernetes / Scheduler |
- Up
- Scheduling Rate
- Scheduling latency 99th Quantile
- Kube API Request Rate
- Post Request Latency 99th Quantile
- Get Request Latency 99th Quantile
- Memory
- CPU usage
- Goroutines
|
Node Exporter / Nodes |
- CPU Usage
- Load Average
- Memory Usage
- Memory Usage
- GPU Utilization
- GPU Memory Utilization
- Disk I/O
- Disk Space Usage
- Network Received
- Network Transmitted
|
Node Exporter / USE Method / Cluster |
- CPU Utilization
- CPU Saturation (Load1 per CPU)
- Memory Utilization
- Memory Saturation (Major Page Faults)
- GPU Utilization
- GPU Memory Utilization
- Network Utilization (Bytes Receive/Transmit)
- Network Saturation (Drops Receive/Transmit)
- Disk IO Utilization
- Disk IO Saturation
- Disk Space Utilization
|
Node Exporter / USE Method / Node |
- CPU Utilization
- CPU Saturation (Load1 per CPU)
- Memory Utilization
- Memory Saturation (Major Page Faults)
- GPU Utilization
- GPU Memory Utilization
- Network Utilization (Bytes Receive/Transmit)
- Network Saturation (Drops Receive/Transmit)
- Disk IO Utilization
- Disk IO Saturation
- Disk Space Utilization
|