When creating a VM-based workload cluster on the UI, you can modify the default parameters for Calico CNI, AVE CSI, node group autoscaling, and GPU addons. You can also enable and configure Observability, Log, External load balancer, and Ingress controller related addons based on your actual needs.
After configuration, click Create to initiate the VM-based workload cluster creation process.
Note:
Default parameters cannot be modified for AIC CNI and ABS CSI addons.
You can configure the Calico CNI addon through the UI or using YAML configuration.
This addon only supports parameter modification when creating the VM-based workload cluster on the UI. Once created, modifications cannot be made through the UI.
UI Configuration
Select Network encapsulation.
Set whether to Enable BGP.
YAML configuration
The following is a YAML example. You can modify the encapsulation and cidr parameters based on your needs:
installation:
calicoNetwork:
bgp: Enabled # Specify whether BGP is enabled, and cannot be modified.
ipPools: # Can configure only one IP pool.
- encapsulation: IPIP # Specify the network encapsulation method.
cidr: 172.16.0.0/16 # Specify the CIDR for the IP Pool, which must be within the pod IP CIDR address range; this range cannot overlap with the network used by the cluster and cannot use public addresses. Different clusters can use the same network segment.
You can configure the AVE CSI addon using YAML.
Parameter description
The parameters that can be configured are as follows:
| Parameter | Description | ||
|---|---|---|---|
| driver | maxSnapshotsPerVolume | Maximum number of snapshots per volume. Default is 3. |
|
| preferredVolumeBusType | Bus type to which volumes are preferred to be mounted. Default is VIRTIO. |
||
| storageClass | parameters | storagePolicy | The volume storage policy configured by the default-created StorageClass. Default is REPLICA_2_THIN_PROVISION, supporting REPLICA_2_THIN_PROVISION, REPLICA_3_THIN_PROVISION, REPLICA_2_THICK_PROVISION or REPLICA_3_THICK_PROVISION. |
| csi.storage.k8s.io/fstype | Default file type for volumes configured by the default-created StorageClass. Default is ext4, supporting ext2, ext3,
ext4, or xfs. |
||
| reclaimPolicy | Volume deletion policy configured by the default-created StorageClass. Default is Delete. |
||
YAML example
driver:
maxSnapshotsPerVolume: 3
preferredVolumeBusType: VIRTIO
storageClass:
reclaimPolicy: Delete
parameters:
storagePolicy: REPLICA_2_THIN_PROVISION
csi.storage.k8s.io/fstype: ext4
Once you have enabled node group autoscaling on the Node configuration page, you can configure the parameters for the cluster-autoscaler addon using YAML.
Parameter description
The parameters that can be configured are as follows:
| Parameter | Description | |
|---|---|---|
| extraArgs | skip-nodes-with-local-storage | Specifies whether nodes with local storage should be skipped, defaulting to false. When this parameter is set to true, Cluster Autoscaler will not consider nodes with local storage as targets for scaling down. |
| new-pod-scale-up-delay | Specifies the delay between discovering a new pod and scaling up nodes, defaulting to 1m. |
|
| scale-down-enabled | Specify if cluster scaling down is enabled, defaulting to true. |
|
| scale-down-delay-after-add | Specifies the duration that scaling down operation can recover after scaling up nodes, defaulting to 10m. |
|
| scale-down-delay-after-delete | Specifies the duration that the scaling down operation can recover after deleting nodes, defaulting to 10s. |
|
| scale-down-unneeded-time | Specifies the time after marking a node as unneeded before the delete operation can occur, defaulting to 10m. |
|
| scale-down-utilization-threshold | Specifies the resource utilization threshold for reducing nodes. When utilization falls below this threshold, nodes can be reduced. The default value is 0.5(50%). |
|
| scan-interval | Specifies the frequency for adding or reducing nodes, defaulting to 10s. |
|
YAML example
The following is a YAML example. You can modify relevant parameters as needed.
extraArgs:
skip-nodes-with-local-storage: false
new-pod-scale-up-delay: 1m
scale-down-enabled: true
scale-down-delay-after-add: 10m
scale-down-delay-after-delete: 10s
scale-down-unneeded-time: 10m
scale-down-utilization-threshold: 0.5
scan-interval: 10s
When you mount a GPU device for any worker node group on the Node configuration page, the configuration options for the NVIDIA GPU Operator addon will be displayed and enabled automatically. You can also configure the parameters of this addon through YAML. If you don't configure it at this step, you can do so after the cluster is created.
Note:
- If you disable this addon, you need to manually install the NVIDIA GPU Operator in the cluster. Otherwise, it will cause abnormal functionality related to GPUs.
- If the cluster has vGPU configured, you must configure the
licensingConfigparameter to set up the NVIDIA vGPU license, otherwise it will affect the usage of vGPU functionality.
Parameter description
The parameters that can be configured are as follows:
| Parameter | Description | ||
|---|---|---|---|
| devicePlugin | config |
Configuration options for enabling the TimeSlicing feature in the cluster. For detailed configuration, refer to the YAML example below. Risk warning: Enabling the TimeSlicing feature allows a single passthrough GPU/ vGPU to be allocated to multiple pods simultaneously. If the monitoring addon is also enabled, the GPU monitoring collector will not be able to recognize the pods using the GPU resources, causing GPU charts in the monitoring view that rely on pod labels to be unavailable. |
|
| driver | licensingConfig | nlsEnabled | Specifies whether to enable the NVIDIA vGPU licensing system. The default value is false, and AKE will automatically set this to true if the cluster uses vGPU. |
| clientConfigurationToken |
NVIDIA vGPU license, which is not configured by default. If it needs to be configured, refer to NVIDIA official documentation to apply for a license and enter it in this parameter. |
||
YAML example
The following is a YAML example. You can modify relevant parameters as needed.
devicePlugin:
config:
create: true
name: time-slicing
default: any
data:
any: |-
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 4
driver:
licensingConfig:
nlsEnabled: true
clientConfigurationToken: "test-fake-token"
To enable monitoring and alerting functions, it is recommended to associate the VM-based workload cluster with the observability service of version 1.4.0 or later in the 1.x.x series.
Note:
Before associating the cluster with the observability service, ensure that the VM network where the business NICs of the cluster nodes reside has L3 network reachability to the VM network where the virtual NIC of the observability virtual machine resides, and that the AKE system service has been associated with the observability service to ensure proper alert functionality.
Enable the option Associate observability service, then select the observability service you want to associate.
Once the observability service is associated, the monitoring addon will be automatically enabled. You can also configure this addon's parameters obs-monitoring-agent and Prometheus via YAML.
Configuring the obs-monitoring-agent parameter
Parameter description
| Parameter | Description | ||
|---|---|---|---|
| metrics | remote_write_global | labels | Custom key-value pairs used to add specific labels to the metrics sent to external systems, typically for identifying the data source. Not configured by default. |
| remote_write | url | The target address for receiving monitoring data. Not configured by default. | |
| tls_insecure_skip_verify | Whether to skip TLS certificate verification. The default is false. Note: If the URL uses the HTTPS protocol and the certificate is self-signed, this must be set to |
||
| send_timeout | The timeout for sending requests. The default is 1m. |
||
| headers | HTTP request headers. Not configured by default. | ||
| bearer_auth.token | The token used for Bearer authentication. Not configured by default. | ||
| basic_auth.username | BasicAuth username. Not configured by default. | ||
| basic_auth.password | BasicAuth password. Not configured by default. | ||
| resources | requests | cpu | CPU request, default is 250m. |
| memory | Memory request, default is 300Mi. |
||
| limits | cpu | CPU limit, default is "2". |
|
YAML example
agent:
# Used to send metrics to external systems.
metrics:
remote_write_global:
labels:
# Adds specific labels to the metrics sent to remote systems, typically used to identify the data source.
# Optional
source: ake-monitor
my_key: my_value
remote_write:
- url: https://remote-prometheus.example.com/api/v1/write
tls_insecure_skip_verify: true
send_timeout: 30s
# Multiple authentication methods are available. Choose as needed; no authentication is required if not needed.
# Insert fields directly into the header.
headers:
Authorization: 'Bearer eyJhbGciOiJ...'
bearer_auth:
token: 'eyJhbGciOiJ...'
basic_auth:
username: remote
password: my_secret
# Adjust agent resource settings.
resources:
requests:
cpu: 250m
memory: 300Mi
limits:
cpu: "2"
Configuring Prometheus parameters
Parameter description
| Parameter | Description | |||
|---|---|---|---|---|
| grafana | resources | limits | cpu | CPU limit, default is 500m. |
| memory | Memory limit, default is 300Mi. |
|||
| requests | cpu | CPU request, default is 10m. |
||
| memory | Memory request, default is 100Mi. |
|||
| persistent | enable | Whether to enable persistent storage, default is true. |
||
| storageClassName | Storage class name used for persistent storage. | |||
| accessModes | Access modes of the persistent volume, default is ReadWriteOnce. |
|||
| size | Size of the persistent volume, default is 1Gi. |
|||
| config | [auth.anonymous] | enabled | Whether to enable anonymous login, default is true. |
|
| [security] | admin_password | Password for the admin account of Grafana Web UI. For the default value, refer to the official Grafana documentation. | ||
| prometheus | scrapeInterval | Time interval for scraping metric data from targets, default is 30s. |
||
| scrapeTimeout | Maximum timeout for a single scrape, default is 10s. |
|||
| prometheusOverrides | monitorExtraNamespaces | Specifies a list of namespaces except those defined by AKE, from which prometheus CR resources can be scraped, such as serviceMonitors in custom namespaces. Not configured by default. | ||
YAML example
The following is a YAML example. You can modify relevant parameters as needed.
grafana:
persistent:
size: 1Gi
config: |
[security]
admin_password = *** # Specifies the password for the Grafana Web UI admin account.
[auth.anonymous]
enabled = true
[date_formats]
default_timezone = Asia/Singapore
use_browser_locale = true
default_timezone = browser
[dashboards]
# Path to the default home dashboard. If this value is empty, then Grafana uses StaticRootPath + "dashboards/home.json"
default_home_dashboard_path = /grafana-dashboard-definitions/0/k8s-resources-cluster/k8s-resources-cluster.json
prometheus:
scrapeInterval: 30s
scrapeTimeout: 10s
prometheusOverrides:
monitorExtraNamespaces:
- my-ns1
- my-ns2
Enable Alert.
Once enabled, you can not only view alert information, but also view and edit alert rules and configure alert notifications on the Alert main page in AOC.
You can enable the EFK (i.e., Elasticsearch, FluentBit, and Kibana) addons as needed. Set the log Retention time and Storage capacity through the UI, and you can also configure other parameters via YAML.
Note:
- You need to enable the monitoring addon simultaneously to view log information after enabling the logging addon.
- The Kibana addon is not enabled by default. If needed, you can contact after-sales engineers to enable Kibana through the command line.
Parameter description
You can configure parameters elasticsearch, elasticcurator, and fluent-bit as shown in the table below.
elasticsearch
| Parameter | Description | ||
|---|---|---|---|
| resources | limits | cpu | CPU limit, default is 2000m. |
| memory | Memory limit, default is 3584Mi. |
||
| requests | cpu | CPU request, default is 100m. |
|
| memory | Memory request, default is 1638Mi. |
||
| volumeClaimTemplate | accessModes | Access modes for persistent volumes. | |
| resources | requests.storage | Storage space size requested for persistent volume, default is 40Gi. |
|
| service | type | Service type, options include ClusterIP, NodePort, LoadBalancer, etc. Default is ClusterIP. |
|
elasticcurator
| Parameter | Description |
|---|---|
| logRetentionDays | Log retention days, default is 30. |
| schedule | Set scheduled task scheduling rule, default is '30 0 * * *'. |
fluent-bit
| Parameter | Description | ||
|---|---|---|---|
| resources | limits | cpu | CPU limit, default is 500m. |
| memory | Memory limit, default is 300Mi. |
||
| requests | cpu | CPU request, default is 10m. |
|
| memory | Memory request, default is 50Mi. |
||
YAML example
The following is a YAML example. You can modify relevant parameters as needed.
elasticsearch:
volumeClaimTemplate:
accessModes: ["ReadWriteOnce"]
# usage option
# storageClassName: ""
resources:
requests:
storage: 40Gi
resources: # Adjust resources configuration to cope with higher data volumes.
requests:
cpu: 500m
memory: 2Gi
limits:
# Make sure that the node has sufficient resources to avoid deployment failures.
cpu: 2000m
memory: 4Gi
elasticcurator:
logRetentionDays: 30
fluentbit:
resources:
limits:
cpu: 500m
memory: 300Mi
requests:
cpu: 20m
memory: 80Mi
An external load balancer is used to provide an externally accessible IP address for LoadBalancer type Kubernetes service objects and to forward external traffic to the correct ports on cluster nodes.
AKE provides the MetalLB addon as an external load balancer. You can enable MetalLB addon as needed and configure one or more IP ranges through the UI or YAML.
Note:
- The IP ranges should be in the same network segment as the network where the Kubernetes cluster node's business NIC resides and should not be in use. MetalLB will allocate IP addresses for load balancer services from within this range.
- When adding multiple IP ranges, the IP addresses between multiple ranges should not overlap.
- Due to the default setting of the
avoidBuggyIPsparameter in the MetalLB addon astrue, IP addresses ending with.0and.255will be avoided for allocation.
The IP range input format supports IP ranges or CIDR blocks:
<start IP> - <end IP>, for example, 192.168.2.1-192.168.3.255. When adding a single IP address, it is still necessary to use the format of an IP range, for example, 192.168.1.1-192.168.1.1.IP address/subnet mask bits, for example, 192.168.1.1/32.YAML example
The following is a YAML example. You can modify the IP range as needed.
layer2IPAddressPools:
- 10.0.0.5/24
Ingress is an API object that manages external access to services in a cluster, typically over HTTP. Ingress can provide load balancing, SSL termination, and name-based virtual hosting.
AKE provides the Contour addon as an ingress controller. You can enable the Contour addon as needed and adjust its parameters via YAML.
Note:
- When the service type is LoadBalancer, the IP address of ingress will be automatically allocated from the IP range of the external load balancer. Before using it, you need to deploy an external load balancer in advance. You can directly enable the built-in MetalLB addon or install other external load balancers as needed.
- When the workload cluster uses the AIC CNI addon, configuring the
externalTrafficPolicyattribute is not supported.
Parameter description
The parameters that can be configured are as follows:
| Parameter | Description | |||
|---|---|---|---|---|
| envoy | resources | limits | cpu | CPU limit, default is 500m. |
| memory | Memory limit, default is 256Mi. |
|||
| requests | cpu | CPU request, default is 10m. |
||
| memory | Memory request, default is 64Mi. |
|||
| service | type | Service type, options include NodePort, LoadBalancer. | ||
| contour | resources | limits | cpu | CPU limit, default is 500m. |
| memory | Memory limit, default is 256Mi. |
|||
| requests | cpu | CPU request, default is 10m. |
||
| memory | Memory request, default is 64Mi. |
|||
| configInline | timeouts | request-timeout | Set the default timeout for ingress requests. | |
YAML example
The following is a YAML example. You can modify relevant parameters as needed.
# Ensure nodes have sufficient resources to avoid deployment failure.
envoy:
service:
type: LoadBalancer
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 10m
memory: 64Mi
contour:
resources:
limits:
cpu: 500m
memory: 256Mi
requests:
cpu: 10m
memory: 64Mi
configInline:
timeouts:
request-timeout: 60s