Docs
uk8s
GPU
Multiple Pods Share a GPU

Multiple Pods Share a GPU

This solution will deploy the GPU-Share plugin. After the deployment is completed, the GPU of the cluster nodes can be scheduled by multiple Pods. Currently, it only supports command-line installation. In the future, the UK8S team will add this function to the cluster plugin according to the schedule to facilitate one-click installation.

Install and Use the GPU Sharing Plugin

⚠️ Please check the Kubernetes version before installing. The required Kubernetes version is >=1.17.4

1. Label the nodes that require GPU sharing

kubectl label node <nodeip> nodeShareGPU=true

2. Delete the original nvdia plugin in the cluster using kubectl

kubectl delete ds -n kube-system nvidia-device-plugin-daemonset

3. Use kubectl to install the GPU-Share plugin

kubectl apply -f https://docs.ucloud-global.com/uk8s/yaml/gpu-share/1.1.0.yaml

Test GPU Sharing

Test Conditions:

  1. The cluster only has one single-card GPU cloud host.
  2. The cluster has completed the plugin installation following the above three steps.
  3. The plugin pod is now in a running state.

Next, we run test-gpushare1 and test-gpushare2 respectively.

# Run test-gpushare1
kubectl apply -f https://docs.ucloud-global.com/uk8s/yaml/gpu-share/test-gpushare1.yaml
# Run test-gpushare2
kubectl apply -f https://docs.ucloud-global.com/uk8s/yaml/gpu-share/test-gpushare2.yaml

Take test-gpushare1 as an example.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-gpushare1
  labels:
    app: test-gpushare1
spec:
  selector:
    matchLabels:
      app: test-gpushare1
  template:
    metadata:
      labels:
        app: test-gpushare1
    spec:
      schedulerName: gpushare-scheduler
      containers:
      - name: test-gpushare1
        image: uhub.ucloud-global.com/ucloud//gpu-player:share
        command:
          - python3
          - /app/main.py
        resources:
          limits:
            # GiB
            ucloud.cn/gpu-mem: 1

In the limits, ucloud.cn/gpu-mem: 1 is set. Similarly, test-gpushare2 also has this setting. Then, we can observe that with only a single GPU card node in the cluster, the GPU can support two Pods at the same time.

kubectl get pod |grep test-gpushare

Monitor GPU Usage

You can monitor the resource usage of the GPU node or check by entering the GPU node and executing nvidia-smi.

Remove the GPU Sharing Plugin

Please execute the following command on the master node

kubectl delete -f https://docs.ucloud-global.com/uk8s/yaml/gpu-share/1.1.0.yaml
kubectl apply -f /etc/kubernetes/yaml/nvidia-device-plugin.yaml