Node Common Fault Handling

Nodes, as entities carrying workloads, are very important objects in Kubernetes. In actual operation, nodes may encounter various problems. This article briefly describes various abnormal states of nodes and troubleshooting ideas.

1. Node Status Description

Node Situation	Description	Handling Method
Ready	True indicates that the node is healthy. False indicates that it’s unhealthy. Unknown indicates that the node is missing.
DiskPressure	True indicates that the node disk capacity is tight. False indiactes the opposite.
MemoryPressure	True indicates that the node memory usage is too high. False indiactes the opposite.
PIDPressure	True indicates that there are too many processes running on the node. False indiactes the opposite.
NetworkUnavailable	True indicates that the node network configuration is abnormal. False indiactes the opposite.

2. Common Node Commands

Check node status


kubectl get nodes

View node events


kubectl describe node ${NODE_NAME}

When these two commands can’t indicate anything, we can also use Linux’s related commands to assist in judgment. At this time, we need to log in to the node and use linux-related commands to check the node status.

Check node connectivity

3.1 Network check: We can check the network connectivity of the node using the Ping command from the cluster’s Master node.

3.2 Health check: Log in to the XXXCloud console, view the node on the cloud host page to see whether it is in Running status, including viewing CPU and memory usage rate, confirming whether the node is under high load.

3. K8S Component Fault Check

UK8S clusters default to three Master nodes. K8S core components are deployed on all three Master nodes and provide external services through load balancing. If you detect component abnormalities, log in to the corresponding Master node (or log in to each Master node sequentially if the issue cannot be localized). Use the following commands to check component status, identify error causes, and restart abnormal components:


systemctl status ${PLUGIN_NAME}
journalctl -u ${PLUGIN_NAME}
systemctl restart ${PLUGIN_NAME}

UK8S Core Components and Their Names：

Component	Component Name
Kubelet	kubelet
API Server	kube-apiserver
Controller Manager	kube-controller-manager
Etcd	etcd
Scheduler	kube-scheduler
KubeProxy	kube-proxy

For instance, to check APIServer component status, you need to execute systemctl status kube-apiserver.

4. UK8S Home Page Keeps Refreshing?

Is the ulb4 corresponding to api-server deleted (uk8s-xxxxxx-master-ulb4)
Have the three master hosts of the UK8S cluster been deleted or shut down, etc.
Log into the three master nodes of UK8S, and check whether the etcd and kube-apiserver services are normal. If abnormal, try to restart the service

3.1 systemctl status etcd / systemctl restart etcd If a single etcd restart fails, try to restart the etcd of the three nodes at the same time
3.2 systemctl status kube-apiserver / systemctl restart kube-apiserver

5. What to Do When UK8S Node is NotReady

Use kubectl describe node node-name to check the reason for the node being NotReady, or directly view node details on the console page.
If you can log into the node, view the logs of kubelet with journalctl -u kubelet, and check whether kubelet is working normally with system status kubelet.
For nodes that cannot be logged in, if you need a quick recovery, power off and restart the corresponding host via the console.
View host monitoring, or log in to the host and execute the sar command. If disk/CPU usage suddenly increases alongside high memory usage, it is generally caused by a memory OOM (Out of Memory). When memory usage is excessively high, disk cache becomes minimal, leading to frequent disk I/O, further increasing system load and creating a vicious cycle of high CPU usage.
For memory OOM cases, customers should self-check process memory conditions. K8S recommends that request and limit values should not differ too much—significant differences make nodes more prone to crashes.
If you have questions about the cause of node notready, please contact manual support according to UK8S Manual Support