Docs
uk8s
Kubernetes Practice
Pod Management
Common Troubleshooting for Pod

Common Troubleshooting for Pod

When deploying applications in Kubernetes, we often encounter abnormal situations such as Pods being in the Pending state for a long time, or restarting repeatedly, below we introduce various abnormal states and troubleshooting thoughts.

1. Common Errors

StatusStatus ExplanationTroubleshooting
ErrorAn error occurs in the Pod startup process.Usually caused by incorrect configuration of container startup commands and parameters, please contact the image maker
NodeLostThe node where the Pod is located is lost.Check the status of the node where the Pod is located
UnkownThe node where the Pod is located is lost or other unknown exceptions.Check the status of the node where the Pod is located
PendingThe Pod is waiting to be scheduled.Caused by insufficient resources, etc., view Pod events with the kubectl describe command
TerminatingThe Pod is being destroyed.Can be forcibly deleted by adding --fore parameter
CrashLoopBackOffThe container exited, and the Kubelet is restarting it.Usually caused by incorrect configuration of container startup commands and parameters
ErrImageNeverPullThe strategy forbids pulling images.Pulling image failed, verify if the imagePullSecrets are correct
ImagePullBackOffRetrying pull.Issues with network connectivity between the image repository and the cluster
RegistryUnavailableUnable to connect to the image repository.Contact the repository administrator
ErrImagePullFailed to pull image.Contact the repository administrator, or verify if the image name is correct
RunContainerErrorFailed to start the container.Exception in container parameter configuration
PostStartHookErrorpostStart hook command error.postStart command is incorrect
NetworkPluginNotReadyNetwork plugin not fully started.CNI plugin exception, you can check the CNI status

2. Common Commands

When we find that the Pod is in the above state, we can use the following commands to quickly locate the problem:

  1. Get Pod status
kubectl -n ${NAMESPACE} get pod  -o wide
  1. View the Pod’s yaml configuration
kubectl -n ${NAMESPACE} get pod ${POD_NAME}  -o yaml
  1. View Pod events
kubectl  -n ${NAMESPACE} describe pod ${POD_NAME}
  1. View Pod logs
kubectl  -n ${NAMESPACE} logs ${POD_NAME} ${CONTAINER_NAME}
  1. Login to Pod
kubectl -n ${NAMESPACE} exec -it  ${POD_NAME} /bin/bash

3. Does UK8S limit the containers deployed on the Node? How to modify?

UK8S, in order to ensure the stable operation of Pods in a production environment, limits the number of Pods on each Node to 110. Users can log in to the Node node and modify maxpods:110 in “/etc/kubernetes/kubelet.conf”, and then execute systemctl restart kubelet to restart kubelet.

4. Why did my container exit as soon as it started?

  1. View container logs to troubleshoot the cause of abnormal restart
  2. Whether the pod has correctly set the startup command, the startup command can be specified when making the image, or it can be specified in the pod configuration
  3. The startup command must stay in the foreground, otherwise k8s will think the pod has ended, and restart the pod.

5. How to adjust Docker’s log level

  1. Modify the /etc/docker/daemon.json file, add a line of configuration “debug”: true
  2. Reload Docker configuration with systemctl reload docker and view the logs
  3. If you no longer need to view detailed logs, delete the debug configuration and reload docker again

6. Why is the node abnormal, but the Pod is still in the Running state

  1. This is caused by k8s’s status protection, which is prone to occur when there are few nodes or many abnormal nodes
  2. You can view the documentation at https://kubernetes.io/zh/docs/concepts/architecture/nodes/#reliability

7. What if the node is down and the Pod is stuck in Terminating

  1. After the node has been down for a certain period of time (usually 5 minutes), k8s will try to evict the pod, causing the pod to become Terminating
  2. Since kubelet can’t perform series of operations to delete the pod at this time, the pod will be stuck in Terminating
  3. For pods of type daemonset, it is scheduled on each node by default, so you don’t need to consider this type of pod when the pod is down, and k8s will not evict this type of pod by default
  4. For pods of type depolyment and replicaset, when the pod is stuck in termanting, the controller will automatically pull up an equivalent number of pods
  5. For pods of type statefulset, when the pod is stuck in termanting, because the pod name under statefulset is fixed, the new pod will not be pulled up until the previous pod is completely deleted.
  6. For pods using udisk-pvc, since the pvc cannot be unloaded, it will cause the newly started pod to fail to run, please follow the content in this article related to pvc(#how to check the actual mount situation of the udisk corresponding to the pvc), to verify the relevant relationships

8. What to do if the Pod exits abnormally?

  1. kubectl describe pods pod-name –n ns to view the related events and statuses of each container of the pod, whether it is the pod’s own exit, whether it is killed by oom, or is it evicted
  2. If it is the pod’s own exit, kubectl logs pod-name -p -n ns to view the container’s exit logs and investigate the cause
  3. If it is killed by oom, it is recommended to readjust the pod’s request and limit settings (the two should not differ too much) based on business needs, or to check whether there is memory leak
  4. If the pod is evicted, it means the node is under too much pressure, you need to check which pod is using up too many resources, and adjust the request and limit settings
  5. For exits not caused by the pod itself, you need to execute dmesg to view system logs and journalctl -u kubelet to view kubelet related logs.

9. Why the direct container network started by Docker on K8S node is not working

  1. UK8S uses UCloud Global‘s own CNI plugin, and containers started directly with Docker cannot use this plugin, so the network is disconnected.
  2. If you need to run tasks for a long time, it is not recommended to start containers directly with Docker on the UK8S node, you should use pods
  3. If it is just a temporary test, you can add the --network host parameter to start the container in the hostnetwork mode

10. Timezone Issues of Pod

The containers running in the Kubernetes cluster use Greenwich Mean Time (GMT) by default, not the local time of the host. If you need the container time to be consistent with the host time, you can use the “hostPath” method to mount the timezone file on the host to the container.

Most Linux distributions configure the timezone through the “/etc/localtime” file, we can get the timezone information through the following command:

# ls -l /etc/localtime
lrwxrwxrwx. 1 root root 32 Oct 15  2015 /etc/localtime -> ../usr/share/zoneinfo/Asia/Shanghai

From the above information, we know that the timezone of the host is Asia/Shanghai, and a Pod yaml example is shown below, which explains how to change the timezone configuration in the container to Asia/Shanghai, so that it is consistent with the host.

apiVersion: app/v1
kind: Pod
metadata:
 name: nginx
 labels:
   name: nginx
spec:
    containers:
    - name: nginx
      image: nginx
      imagePullPolicy: "IfNotPresent"
      resources:
        requests:
          cpu: 100m
          memory: 100Mi
      ports:
         - containerPort: 80
      volumeMounts:
      - name: timezone-config
        mountPath: /etc/localtime
    volumes:
      - name: timezone-config
        hostPath:
           path: /usr/share/zoneinfo/Asia/Shanghai

If the container has been created before, you only need to add the volumeMounts and volumes parameters to the yaml file, and then update it using the kubectl apply command.