AKE 1.3.2

Managing the Kubernetes service>
Creating a workload cluster>
Creating a VM-based workload cluster

Configuring cluster addons

When creating a VM-based workload cluster on the UI, you can modify the default parameters for Calico CNI, AVE CSI, node group autoscaling, and GPU addons. You can also enable and configure Monitoring, Logging, External load balancer, and Ingress controller addons based on your requirements.

After configuration, click Create to initiate the VM-based workload cluster creation process.

Note:

Default parameters cannot be modified for AIC CNI and ABS CSI addons.

Configuring Calico CNI

You can configure the Calico CNI addon through the UI or using YAML configuration.

This addon only supports parameter modification when creating the VM-based workload cluster on the UI. Once created, modifications cannot be made through the UI.

UI Configuration
1. Select Network encapsulation.
  - None: No encapsulation mode, allowing direct communication between containers using standard IP routing.
  - IPIP: IP-in-IP encapsulation, enabling cross-node communication for containers by encapsulating IP packets within another IP packet.
  - VXLAN: VXLAN encapsulation, encapsulating IP packets within UDP packets, facilitating cross-node communication for containers.
2. Set whether to Enable BGP.
  - When the Network encapsulation option is None or IPIP, BGP is enabled by default and cannot be modified.
  - When the Network encapsulation option is VXLAN, BGP is disabled by default and cannot be modified.

YAML configuration

The following is a YAML example. You can modify the encapsulation and cidr parameters based on your requirements:

installation:
  calicoNetwork:
    bgp: Enabled   # Specify whether BGP is enabled, and cannot be modified.
    ipPools:    # Can configure only one IP pool.
      - encapsulation: IPIP  # Specify the network encapsulation method.
        cidr: 172.16.0.0/16   # Specify the CIDR for the IP Pool, which must be within the Pod IP CIDR address range; this range cannot overlap with the network used by the cluster and cannot use public addresses. Different clusters can use the same network segment.

Configuring AVE CSI

You can configure the AVE CSI addon using YAML.

Parameter description

The parameters that can be configured are shown in the table below:

Parameter			Description
driver	maxSnapshotsPerVolume		Maximum number of snapshots per volume. Default is `3`.
driver	preferredVolumeBusType		Bus type to which volumes are preferred to be mounted. Default is `VIRTIO`.
storageClass	parameters	storagePolicy	Default volume storage policy created by StorageClass, set to `REPLICA_2_THIN_PROVISION`, supporting `REPLICA_2_THIN_PROVISION`, `REPLICA_3_THIN_PROVISION`, `REPLICA_2_THICK_PROVISION`, or `REPLICA_3_THICK_PROVISION`.
	parameters	csi.storage.k8s.io/fstype	Default file type for volumes created by StorageClass, set to `ext4`, supporting `ext2`, `ext3`, `ext4`, or `xfs`.
	reclaimPolicy		Default volume deletion policy configured by StorageClass. Default is `Delete`.

YAML example

driver:
  maxSnapshotsPerVolume: 3
  preferredVolumeBusType: VIRTIO
storageClass:
  reclaimPolicy: Delete
  parameters:
    storagePolicy: REPLICA_2_THIN_PROVISION
    csi.storage.k8s.io/fstype: ext4

Configuring node group autoscaling

Once you have enabled node group autoscaling on the Node configuration ipage, you can configure the parameters for the cluster-autoscaler addon using YAML.

Parameter description

The parameters that can be configured are shown in the table below:

Parameter		Description
extraArgs	skip-nodes-with-local-storage	Specifies whether nodes with local storage should be skipped, defaulting to `false`. When this parameter is set to `true`, Cluster Autoscaler will not consider nodes with local storage as targets for scaling down.
	new-pod-scale-up-delay	Specifies the delay between discovering a new pod and scaling up nodes, defaulting to `1m`.
	scale-down-enabled	Specify if cluster scaling down is enabled, defaulting to `true`.
	scale-down-delay-after-add	Specifies the duration that scaling down operation can recover after scaling up nodes, defaulting to `10m`.
	scale-down-delay-after-delete	Specifies the duration that the scaling down operation can recover after deleting nodes, defaulting to `10s`.
	scale-down-unneeded-time	Specifies the time after marking a node as unneeded before the delete operation can occur, defaulting to `10m`.
	scale-down-utilization-threshold	Specifies the resource utilization threshold for reducing nodes, defaulting to `0.5`(50%).
	scan-interval	Specifies the frequency for adding or reducing nodes, defaulting to `10s`.

YAML example

The following is a YAML example. You can configure parameters as needed.

extraArgs:
  skip-nodes-with-local-storage: false
  new-pod-scale-up-delay: 1m
  scale-down-enabled: true
  scale-down-delay-after-add: 10m
  scale-down-delay-after-delete: 10s
  scale-down-unneeded-time: 10m 
  scale-down-utilization-threshold: 0.5
  scan-interval: 10s

Configuring GPU

When you mount a GPU device for any worker node group on the Node configuration page, the configuration options for the NVIDIA GPU Operator addon will be displayed and enabled automatically. You can also configure the parameters of this addon through YAML. If you don't configure it at this step, you can do so after the cluster is created.

Note:

If you disable this addon, you need to manually install the NVIDIA GPU Operator in the cluster. Otherwise, it will cause abnormal functionality related to GPUs.

If the cluster uses vGPU, you must configure the licensingConfig parameter to set up the NVIDIA vGPU license, otherwise it will affect the usage of vGPU functionality.

Parameter description

The parameters that can be configured are shown in the table below.

Parameter Description

devicePlugin

config

Parameter	Description
devicePlugin	config	Configuration options for enabling the TimeSlicing feature in the cluster. For detailed configuration, refer to the YAML example below. Risk warning: Enabling the TimeSlicing feature allows a single passthrough GPU/ vGPU to be allocated to multiple pods simultaneously. If the monitoring addon is also enabled, the GPU monitoring collector will not be able to recognize the pods using the GPU resources, causing GPU charts in the monitoring view that rely on pod labels to be unavailable.
driver	licensingConfig	nlsEnabled	Specifies whether to enable the NVIDIA vGPU licensing system. The default value is `false`, and AKE will automatically set this to `true` if the cluster uses vGPU.
clientConfigurationToken	NVIDIA vGPU license, which is not configured by default. If it needs to be configured, refer to NVIDIA official documentation to apply for a license and enter it in this parameter.

Configuration options for enabling the TimeSlicing feature in the cluster. For detailed configuration, refer to the YAML example below.

Risk warning:

Enabling the TimeSlicing feature allows a single passthrough GPU/ vGPU to be allocated to multiple pods simultaneously. If the monitoring addon is also enabled, the GPU monitoring collector will not be able to recognize the pods using the GPU resources, causing GPU charts in the monitoring view that rely on pod labels to be unavailable.

driver licensingConfig nlsEnabled Specifies whether to enable the NVIDIA vGPU licensing system. The default value is false, and AKE will automatically set this to true if the cluster uses vGPU.

clientConfigurationToken

NVIDIA vGPU license, which is not configured by default. If it needs to be configured, refer to NVIDIA official documentation to apply for a license and enter it in this parameter.

YAML example

The following is a YAML example. You can modify the parameter values as needed.

devicePlugin:
  config:
    create: true
    name: time-slicing
    default: any
    data:
      any: |-
        sharing:
          timeSlicing:
            resources:
            - name: nvidia.com/gpu
              replicas: 4
driver:
  licensingConfig:
    nlsEnabled: true
    clientConfigurationToken: "test-fake-token"

Enabling monitoring

You can enable Prometheus as needed and set Storage capacity for monitoring through the UI. You can also configure other parameters via YAML.

The minimum storage capacity is 40 GiB, and you can input an integer greater than or equal to 40 GiB. When setting up, make sure that the target storage used by the CSI addon for the VM-based workload cluster has sufficient storage resources.

Parameter description

You can configure parameters grafana and prometheus as shown in the table below.

Grafana

Parameter			Description
resources	limits	cpu	CPU limit, default is `500m`.
	limits	memory	Memory limit, default is `300Mi`.
	requests	cpu	CPU request, default is `10m`.
	requests	memory	Memory request, default is `100Mi`.
persistent	enable		Whether to enable persistent storage, default is `true`.
	storageClassName		Storage class name used for persistent storage.
	accessModes		Access modes of the persistent volume, default is `ReadWriteOnce`.
	size		Size of the persistent volume, default is `1Gi`.
config	[auth.anonymous]	enabled	Whether to enable anonymous login, default is `true`.
config	[security]	admin_password	Password for the admin account of Grafana Web UI. Refer to the default value in Grafana official documentation.

Prometheus

Parameter			Description
resources	limits	memory	Memory limit, default is `4Gi`.
	requests	cpu	CPU request, default is `200m`.
	requests	memory	Memory request, default is `1Gi`.
scrapeInterval			Time interval for scraping metric data from targets, default is `30s`.
scrapeTimeout			Maximum timeout for a single scrape, default is `10s`.
evaluationInterval			Time to periodically check rules and configuration files, default is `30s`.
retention			Retention time for stored data, default is `30d`.
storage	volumeClaimTemplate	apiVersion	API version.
		kind	Kind of template for persistent volume.
		spec.accessModes	Access modes of the persistent volume.
		spec.resources.requests.storage	Storage space size requested for the persistent volume, default is `40Gi`.

YAML example

The following is a YAML example. You can modify relevant parameters as needed.

grafana:
  persistent:
    size: 1Gi
  config: |
    [security]
    admin_password = ***  # Specify the password for the admin account of Grafana Web UI.
    [auth.anonymous]
    enabled = true
    [date_formats]
    default_timezone = Asia/Singapore
    use_browser_locale = true
    default_timezone = browser
    [dashboards]
    # Path to the default home dashboard. If this value is empty, then Grafana uses StaticRootPath + "dashboards/home.json"
    default_home_dashboard_path = /grafana-dashboard-definitions/0/k8s-resources-cluster/k8s-resources-cluster.json
prometheus:
  storage:
    volumeClaimTemplate:
      apiVersion: v1
      kind: PersistentVolumeClaim
      spec:
        # storageClassName: "Specify the name"
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 40Gi
  resources: # Adjust resources configuration to handle larger data volumes.
    requests:
      memory: 2Gi
      cpu: 500m
    limits:
      # Ensure nodes have sufficient memory, or deployment will fail.
      memory: 6Gi

Enabling log

You can enable the EFK (i.e., Elasticsearch, FluentBit, and Kibana) addons as needed. Set the log Retention time and Storage capacity through the UI, and you can also configure other parameters via YAML.

Retention time: Supports setting an integer between 1 and 30. The system will automatically clean logs older than the specified retention time at 00:30 every day.
Storage capacity: The minimum storage capacity is 40 GiB. You can enter an integer greater than or equal to 40 GiB. When setting up, make sure that the target storage used by the CSI addon for the VM-based workload cluster has sufficient storage resources.

Note:

You need to enable the monitoring addon simultaneously to view log Info after enabling the logging addon.

The Kibana addon is not enabled by default. If needed, contact after-sales engineers to enable Kibana via the command line.

Parameter description

You can configure parameters elasticsearch, elasticcurator, and fluent-bit as shown in the table below.

elasticsearch

Parameter			Description
resources	limits	cpu	CPU limit, default is `2000m`.
	limits	memory	Memory limit, default is `3584Mi`.
	requests	cpu	CPU request, default is `100m`.
	requests	memory	Memory request, default is `1638Mi`.
volumeClaimTemplate	accessModes		Access modes for persistent volumes.
volumeClaimTemplate	resources	requests.storage	Storage space size requested for persistent volume, default is `40Gi`.
service	type		Service type, options include ClusterIP, NodePort, LoadBalancer, etc. Default is `ClusterIP`.

elasticcurator

Parameter	Description
logRetentionDays	Log retention days, default is `30`.
schedule	Set scheduled task scheduling rule, default is `'30 0 * * *'`.

fluent-bit

Parameter			Description
resources	limits	cpu	CPU limit, default is `500m`.
	limits	memory	Memory limit, default is `300Mi`.
	requests	cpu	CPU request, default is `10m`.
	requests	memory	Memory request, default is `50Mi`.

YAML example

The following is a YAML example. You can modify relevant parameters as needed.

elasticsearch:
  volumeClaimTemplate:
    accessModes: ["ReadWriteOnce"]
    # usage option
    # storageClassName: ""
    resources:
      requests:
        storage: 40Gi
  resources: # Adjust resources configuration to cope with higher data volumes.
    requests:
      cpu: 500m
      memory: 2Gi
    limits:
      # Make sure that the node has sufficient resources to avoid deployment failures.
      cpu: 2000m
      memory: 4Gi
elasticcurator:
  logRetentionDays: 30
fluentbit:
  resources:
    limits:
      cpu: 500m
      memory: 300Mi
    requests:
      cpu: 20m
      memory: 80Mi

Enabling the external load balancer

An external load balancer is used to provide an externally accessible IP address for LoadBalancer type Kubernetes Service objects and to forward external traffic to the correct ports on cluster nodes.

AKE provides the MetalLB addon as an external load balancer. You can enable MetalLB addon as needed and configure one or more IP ranges through the UI or YAML.

Note:

The IP ranges should be in the same network segment as the management network used by Kubernetes cluster nodes and should not be in use. MetalLB will allocate IP addresses for load balancer services from within this range.

When adding multiple IP ranges, the IP addresses between multiple ranges should not overlap.

Due to the default setting of the avoidBuggyIPs parameter in the MetalLB addon as true, IP addresses ending with .0 and .255 will be avoided for allocation.

The IP range input format supports IP ranges or CIDR blocks:

IP range: Format <start IP> - <end IP>, for example, 192.168.2.1-192.168.3.255. When adding a single IP address, it is still necessary to use the format of an IP range, for example, 192.168.1.1-192.168.1.1.
CIDR block: Format IP address/subnet mask bits, for example, 192.168.1.1/32.

YAML example

The following is a YAML example. You can modify the IP range as needed.

layer2IPAddressPools:
  - 10.0.0.5/24

Enabling Ingress controller

Ingress is an API object that manages external access to services in a cluster, typically over HTTP. Ingress can provide load balancing, SSL termination, and name-based virtual hosting.

AKE provides the Contour addon as an Ingress controller. You can enable the Contour addon as needed and adjust its parameters via YAML.

Note:

The IP address of Ingress will be automatically allocated from the IP range of the external load balancer. Before using it, you need to deploy an external load balancer in advance. You can directly enable the built-in MetalLB addon or install other external load balancers as needed.

Parameter description

The parameters that can be configured are as follows:

Parameter				Description
envoy	resources	limits	cpu	CPU limit, default is `500m`.
		limits	memory	Memory limit, default is `256Mi`.
		requests	cpu	CPU request, default is `10m`.
		requests	memory	Memory request, default is `64Mi`.
	service	type		Service type, options include NodePort, LoadBalancer.
contour	resources	limits	cpu	CPU limit, default is `500m`.
		limits	memory	Memory limit, default is `256Mi`.
		requests	cpu	CPU request, default is `10m`.
		requests	memory	Memory request, default is `64Mi`.

YAML example

The following is a YAML example. You can modify relevant parameters as needed.

# Ensure nodes have sufficient resources to avoid deployment failure.
envoy:
  service:
    type: LoadBalancer 
  resources:
    limits:
      cpu: 500m
      memory: 256Mi
    requests:
      cpu: 10m
      memory: 64Mi
contour:
  resources:
    limits:
      cpu: 500m
      memory: 256Mi
    requests:
      cpu: 10m
      memory: 64Mi

In this article

Configuring Calico CNI
Configuring AVE CSI
Configuring node group autoscaling
Configuring GPU
Enabling monitoring
Enabling log
Enabling the external load balancer
Enabling Ingress controller