UK8S - XXXCloud

Cluster Autoscaler (CA)

Introduction

The Cluster Autoscaler (CA) is used to automatically adjust the number of Node nodes in the cluster to meet business needs.

We know that when creating a Pod, we can specify the request amount (Request) of CPU, memory, GPU and other resources for each container. The scheduler component of Kubernetes will determine which Node node to schedule the Pod to based on the Request. If there are no nodes in the cluster with sufficient idle capacity, the Pod will not be successfully created, but will remain in the Pending state until a new Node node is added to the cluster or the stock Pod is deleted to release the idle capacity.

The CA component looks for Pods that cannot be successfully scheduled and traverses the scaling groups to determine whether the new nodes expanded by the scaling group template meet the requirements. If it is determined that the newly added node can make the Pod be successfully scheduled, then CA will expand the cluster.

The CA component will also shrink the cluster. The triggering condition for the scaling down is that the Request request rate of a Node node is lower than the scale-down threshold. However, the scaling down is not carried out immediately, but waits for a period of time (default is 10 minutes). This can be modified by the —scale-down-unneeded-time parameter.

Unlike HPA, CA is not built-in, but runs in the form of Deployment in the Kubernetes cluster. UK8S already supports CA, and you can configure CA in the UK8S management interface.

Working Principle

The scale-up trigger condition for CA is there exist Pods that cannot be successfully created due to insufficient cluster resources. These resources include CPU, memory, and GPU. Taking GPU as an example, when a Pod applies for the GPU resource nvidia.com/gpu (refer to GPU node usage document), but is in a pending state due to no GPU nodes in the cluster, CA will automatically scale up nodes in the scaling group configured with the GPU model template.

The triggering condition for CA scale-down is the node’s resource request rate (Request) is below the scale-down threshold (such as 50%) for a certain period of time (default 10 minutes), and all Pods on the node can be scheduled to other nodes.

It’s worth noting the condition that all Pods on the node can be scheduled to other nodes. Many students who configured CA will question why the node resource request volume is below the threshold but the scale-down is not triggered. The reason is actually simple. If there is an independent Pod running on this node (not managed by any controller), since the Pod cannot be rescheduled, to ensure the normal operation of the business, the node’s scale-down will not proceed.

Using Cluster Scaling in UK8S

1. Create Scaling Configuration

2. Fill in Configuration Parameters

Usually, default values are sufficient

3. Create Scaling Group

Important. The configuration of the Node node when the cluster expansion is triggered, the scaling range is mainly used to prevent unlimited expansion due to DDos, etc.

4. Turn on Cluster Scaling

After the scaling group is created, we need to enable it. After clicking the enabling operation, your UK8S cluster will have a Cluster-Autoscaler Deployment. If you manually delete this Deployment, it will cause the cluster scaling to fail to work normally. You need to close it on the cluster scaling page first, then restart it to trigger re-creation.

CA Parameter Description

CA itself has many command parameters, which can adjust some scaling behaviors. This can be adjusted by changing the args parameters of the CA deployment.

Below are some CA parameters and descriptions:

Parameter	Type	Default Value	Explanation
scale-down-delay-after-add	Duration	10min	Delay for scale-down after expansion.
scale-down-delay-after-delete	Duration	Same as scanning interval	Delay for scale-down after node deletion.
scale-down-unneeded-time	Duration	10min	Time to scale-down after node is marked as unneeded.
node-deletion-delay-timeout	Duration	2min	CA’s timeout for waiting for node deletion to complete.
scan-interval	Duration	10s	Time interval for each scale-in scan.
max-nodes-total	int	0	Maximum scaling node quantity.
cores-total	String	[0:32E+04]	CPU core scaling range of the cluster.
memory-total	String	[0:64E+05]	Memory scaling range of the cluster.