UK8S Core Component Failure Recovery
1. APIServer, Controller Manager, Scheduler Component Failure Recovery
APIServer, Controller Manager, Scheduler are the core management components of Kubernetes. In the UK8S cluster, three Master nodes are configured by default, and these core components are deployed and installed on each Master node. The components provide services to the outside through load balancing, ensuring the high availability of the cluster.
When a component fails, please log in to the three Master nodes one by one, and use systemctl status ${PLUGIN_NAME}
to check the component status. If the component is unavailable, you can use the following steps for recovery:
# Set the intranet IP of a healthy Master node as an environmental variable for easy copying of related files
export IP=10.23.17.200
# Copy binary installation packages of APIServer, Controller Manager, and Scheduler from the healthy node
## For UK8S versions 1.16 and below, K8S components are installed in the hyperkube file
scp root@IP:/usr/local/bin/hyperkube /usr/local/bin/hyperkube
## For UK8S versions 1.17 and later, K8S components are installed as individual binary files
scp root@IP:/usr/local/bin/{kube-apiserver,kube-controller-manager,kube-scheduler} /usr/local/bin/
# Copy service files of APIServer, Controller Manager, and Scheduler components
scp root@IP:/usr/lib/systemd/system/{kube-apiserver.service,kube-controller-manager.service,kube-scheduler.service} /usr/lib/systemd/system/
# Copy configuration files of APIServer, Controller Manager, and Scheduler components
scp root@IP:/etc/kubernetes/{apiserver,controller-manager,kube-scheduler.conf} /etc/kubernetes/
# Copy binary file of kubectl
scp root@IP:/usr/local/bin/kubectl /usr/local/bin/kubectl
# Copy kubeconfig
scp -r root@IP:~/.kube ~/
# Modify APIServer configuration parameters
vim /etc/kubernetes/apiserver # configure advertise-address parameter as the IP of the failed node
# Enable the service
systemctl enable --now kube-apiserver kube-controller-manager kube-scheduler
# Configure intranet and extranet IP of APIServer load balancer (only needs to be configured for extranet IP when APIServer extranet feature is enabled)
scp root@IP:/etc/sysconfig/network-scripts/ifcfg-lo:internal /etc/sysconfig/network-scripts/ifcfg-lo:internal
scp root@IP:/etc/sysconfig/network-scripts/ifcfg-lo:external /etc/sysconfig/network-scripts/ifcfg-lo:external
systemctl restart network
2. Kubelet, Kube-proxy Failure Recovery
Kubelet and Kube-proxy are deployed on each Master/Node node, responsible for node registration and traffic forwarding respectively.
Note: In UK8S clusters created before June 12, 2020, Kubelet is not installed by default on Master nodes and cannot be displayed by
kubectl get node
.
# Set the intranet IP of a healthy node as an environmental variable, for easy copying of related files
export IP=10.23.17.200
# Copy binary installation packages of Kubelet, Kube-proxy from the healthy node
## For UK8S versions 1.16 and below, K8S components are installed in the hyperkube file. Ignore if this operation has been done in the previous step
scp root@IP:/usr/local/bin/hyperkube /usr/local/bin/hyperkube
## For UK8S versions 1.17 and later, K8S components are installed as individual binary files
scp root@IP:/usr/local/bin/{kubelet,kube-proxy} /usr/local/bin/
# Prepare directories
mkdir -p /opt/cni/net.d
mkdir -p /opt/cni/bin
mkdir -p /var/lib/kubelet
# Copy configuration file and service files
scp root@$IP:/etc/kubernetes/{kubelet,kubelet.conf,kube-proxy.conf,ucloud} /etc/kubernetes/
scp root@$IP:/usr/lib/systemd/system/{kubelet.service,kube-proxy.service} /usr/lib/systemd/system/
scp root@$IP:/etc/kubernetes/set-conn-reuse-mode.sh /etc/kubernetes/
scp root@$IP:/etc/rsyslog.conf /etc/
scp root@$IP:/opt/cni/bin/{cnivpc,loopback,host-local} /opt/cni/bin/
scp root@$IP:/opt/cni/net.d/10-cnivpc.conf /opt/cni/net.d/
# Modify configuration parameters
# Modify --node-ip, --hostname-override to the IP of the node to be fixed
# Modify --node-labels in topology.kubernetes.io/zone, failure-domain.beta.kubernetes.io/zone to the available area of the node to be fixed (cn-bj2-02)
# Modify --node-labels in UHostID, node.uk8s.ucloud-global.com/resource_id to the resource ID of the node to be fixed (uhost-xxxxxxxx)
vim /etc/kubernetes/kubelet
# Disable swap
swapoff -a
# Enable services
systemctl enable --now kubelet kube-proxy
3. Container Engine Recovery
3.1 Docker Container Engine
# Set the intranet IP of a healthy Master node as an environmental variable, for easy copying of related files
export IP=10.23.17.200
# Prepare directories
mkdir -p /data/docker
rm -rf /var/lib/docker
ln -s /data/docker /var/lib/docker
# Download and install packages
wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/docker-ce-19.03.14-3.el7.x86_64.rpm
wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.2.el7.x86_64.rpm
wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/docker-ce-cli-19.03.14-3.el7.x86_64.rpm
yum install *.rpm -y
# Copy configuration and service files
scp root@$IP:/usr/lib/systemd/system/docker.service /usr/lib/systemd/system/
scp root@$IP:/etc/docker/daemon.json /etc/docker/
# Enable the service
systemctl enable --now docker
3.2 Containerd Container Engine
# Set the intranet IP of a healthy Master node as an environmental variable, for easy copying of related files
export IP=10.23.17.200
# Prepare directories
mkdir -p /etc/containerd
mkdir -p /data/containerd
mkdir -p /data/log/pods
ln -s /data/containerd /var/lib/containerd
ln -s /data/log/pods /var/log/pods
# Download and install packages
wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.3-3.2.el7.x86_64.rpm
yum install containerd.io-1.4.3-3.2.el7.x86_64.rpm
# Copy configuration files
scp root@$IP:/etc/containerd/{config.toml,containerd.toml} /etc/containerd/
scp root@$IP:/usr/lib/systemd/system/containerd.service /usr/lib/systemd/system/
scp root@$IP:/usr/local/bin/crictl /usr/local/bin/
scp root@$IP:/etc/crictl.yaml /etc/
# Enable services
systemctl start containerd