Docs
gpu
Redhat 6.6 Environment Configuration

Redhat 6.6 Environment Configuration

1. Check GPU Device Recognition

  # yum install pciutils
  # sudo lspci | grep NVIDIA
  3D controller: NVIDIA Corporation Device 1df6 (rev a1) means it is recognized as V100S

If you run “yum install pciutils” and get the message “This system is not registered with an entitlement server. You can use subscription-manager to register.”
Please run the following command to start Redhat account login:
subscription-manager register

Input the username and password of Redhat account according to the prompt.
Confirm that the system has been successfully registered and the subscription is enabled:
subscription-manager list --consumed
Run the following command to update the system:
yum update

2. Download GPU Driver

wget https://cn.download.nvidia.com/tesla/460.106.00/NVIDIA-Linux-x86_64-460.106.00.run

The driver version can be downloaded from the official Nvidia link according to business needs, https://www.nvidia.cn/Download/index.aspx?lang=cn.

3. Disable nouveau

Because the nouveau driver installed in some Linux systems conflicts with the Nvidia driver, it needs to be disabled first. Enter “lsmod | grep nouveau”, if there is a return, you need to disable it. The way to disable is as follows:

# lsmod | grep nouveau
nouveau              1514531  0 
ttm                    89568  1 nouveau
drm_kms_helper        127731  1 nouveau
drm                   355270  3 nouveau,ttm,drm_kms_helper
i2c_algo_bit            5903  1 nouveau
i2c_core               29164  5 i2c_piix4,nouveau,drm_kms_helper,drm,i2c_algo_bit
mxm_wmi                 1967  1 nouveau
video                  21686  1 nouveau
wmi                     6287  2 nouveau,mxm_wmi
# tail -1 /etc/modprobe.d/blacklist.conf 
  blacklist nouveau

4. Install the driver

# sudo sh <driver_installer>.run --kernel-source-path=/usr/src/kernels/<kernel_version>

Where “driver_installer” is the name of the driver installer file, “kernel_version” is the kernel version number checked in the current system. To check the current running kernel version, you can use the following command:

uname -r

5. Verify that the GPU card is working properly

Use nvidia-smi to verify. If the card model can be displayed normally, it can be used normally.

# nvidia-smi
Fri Apr  7 15:02:14 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100S-PCI...  Off  | 00000000:00:03.0 Off |                    0 |
| N/A   26C    P0    35W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |