Docs
uk8s
Node Management
Node Common Fault Handling

Node Common Fault Handling

Nodes, as entities carrying workloads, are very important objects in Kubernetes. In actual operation, nodes may encounter various problems. This article briefly describes various abnormal states of nodes and troubleshooting ideas.

1. Node Status Description

Node SituationDescriptionHandling Method
ReadyTrue indicates that the node is healthy, False indicates that it’s unhealthy, Unknown indicates that the node is missing
DiskPressureTrue indicates that the node disk capacity is tight, False otherwise
MemoryPressureTrue indicates that the node memory usage is too high, False otherwise
PIDPressureTrue indicates that there are too many processes running on the node, False otherwise
NetworkUnavailableTrue indicates that the node network configuration is abnormal, False otherwise

2. Common Node Commands

  1. Check node status
kubectl get nodes
  1. View node events
kubectl describe node ${NODE_NAME}

When these two commands can’t indicate anything, we can also use Linux’s related commands to assist in judgment. At this time, we need to log in to the node and use linux-related commands to check the node status.

  1. Check node connectivity

    3.1 Network check: We can check the network connectivity of the node using the Ping command from the cluster’s Master node;

    3.2 Health check: Log in to the UCloud Global console, view the node on the cloud host page to see whether it is in Running status, including viewing CPU and memory usage rate, confirming whether the node is under high load;

3. K8S Component Fault Check

UK8S cluster defaults to 3 Master nodes, and K8S core components are deployed on all 3 Master nodes and provide services through load balancing. If you find component abnormalities, please log in to the corresponding Master node (can’t be located, log in to the Master node one by one), and use the following commands to check whether the component status in the node is normal, what the error cause is, and restart the abnormal components:

systemctl status ${PLUGIN_NAME}
journalctl -u ${PLUGIN_NAME}
systemctl restart ${PLUGIN_NAME}

UK8S Core Components and Their Names:

ComponentComponent Name
Kubeletkubelet
API Serverkube-apiserver
Controller Managerkube-controller-manager
Etcdetcd
Schedulerkube-scheduler
KubeProxykube-proxy

For instance, to check APIServer component status, you need to execute systemctl status kube-apiserver.

4. UK8S Home Page Keeps Refreshing?

  1. Is the ulb4 corresponding to api-server deleted (uk8s-xxxxxx-master-ulb4)
  2. Have the three master hosts of the UK8S cluster been deleted or shut down, etc.
  3. Log into the three master nodes of UK8S, check whether the etcd and kube-apiserver services are normal, if abnormal, try to restart the service
  • 3.1 systemctl status etcd / systemctl restart etcd If a single etcd restart fails, try to restart the etcd of the three nodes at the same time
  • 3.2 systemctl status kube-apiserver / systemctl restart kube-apiserver

5. What to Do When UK8S Node is NotReady

  1. kubectl describe node node-name Check the reason why the node is notReady, you can also directly view the node details on the console page.
  2. If you can log into the node, view the logs of kubelet with journalctl -u kubelet, and check whether kubelet is working normally with system status kubelet.
  3. For the situation where the node can no longer be logged in, if you want to recover quickly, you can find the corresponding host to shut down and restart on the console.
  4. View host monitoring, or log in to the host and execute the sar command. If you find that the disk cpu and disk usage rate suddenly rise, and the memory usage rate is also high, it is generally caused by memory oom. When memory usage is too high and causes the node to crash, because the memory usage is too high, the disk cache is very small, which will cause frequent disk reading and writing, causing a vicious cycle of high cpu usage.
  5. In the case of memory oom, customers need to check the memory situation of the process themselves. K8s suggests that the values of request and limit should not differ too much. If the difference is large, it is more likely to cause the node to crash.
  6. If you have questions about the cause of node notready, please contact manual support according to UK8S Manual Support