当前位置: 代码网 > it编程>编程语言>Java > 【WSL2笔记2】 搭建深度学习开发环境踩坑笔记 Ubuntu+CUDA+cuDNN+PyTorch+Tensorflow+ONNX

【WSL2笔记2】 搭建深度学习开发环境踩坑笔记 Ubuntu+CUDA+cuDNN+PyTorch+Tensorflow+ONNX

2024年08月03日 Java 我要评论
错误的结果就是配置的所有虚拟环境都以base的python版本运行,无法配置每个虚拟环境使用不同python版本,失去了虚拟环境意义。至此,适用PyTorch、Tensorflow、cuDNN框架的深度学习开发环境搭建完成。WSL 上的 CUDA 用户指南。安装protobuf 3.2版本。Anaconda官网版本档案。下载Anaconda安装包。创建Python虚拟环境。设置Anaconda路径。需要注册账号登录下载。通过SSH传送WSL。

wsl2笔记2 搭建深度学习开发环境 ubuntu+cuda+cudnn+pytorch+tensorflow+onnx

1、anaconda 安装环境配置 (系统级-管理各环境)

anaconda官网版本档案
https://repo.anaconda.com/archive/

1.1 创建软件下载目录

cd ~
mkdir download
cd download

下载anaconda安装包
wget https://repo.anaconda.com/archive/anaconda3-2023.03-linux-x86_64.sh
在这里插入图片描述

1.2 安装anaconda

bash anaconda3-2023.03-linux-x86_64.sh

创建python虚拟环境
conda create -n 名称 python=版本

激活环境
conda activate 名称
在这里插入图片描述

1.3错误的画蛇添足

设置anaconda路径

$ vim ~/.bashrc

加入安装路径

 # anaconda3
export path="/home/xxxx/anaconda3/bin:$path"
source activate

echo 'export path="~/anaconda3/bin:$path"' >> ~/.bashrc
echo 'source activate' >> ~/.bashrc

更新配置
source ~/.bashrc
错误的结果就是配置的所有虚拟环境都以base的python版本运行,无法配置每个虚拟环境使用不同python版本,失去了虚拟环境意义。

1.4 磁盘清理

定期进行缓存和依赖包的清理,解放磁盘空间。

  • 清理前
$ sudo du -sh /home/gpu/anaconda3/pkgs/
[sudo] password for gpu: 
174g    /home/gpu/anaconda3/pkgs/
  • 清理后
$ sudo du -sh /home/gpu/anaconda3/pkgs/
84g     /home/gpu/anaconda3/pkgs/

1.4.1 查看磁盘空间

df -hl

1.4.2 apt-get清理

  • 清理下载缓存
    sudo apt-get clean
  • 清理不需要的依赖包
    sudo apt-get autoremove
  • 清理本地已卸载的包的依赖包
    sudo apt-get autoclean

1.4.3 anaconda3清理

  • 统计conda空间占用
    sudo du -sh ~/anaconda3/*
  • 清除索引缓存、未使用缓存包,不影响已创建的环境
    conda clean -a

2、nvidia driver (系统级-各环境共享)

2.1 官网

https://www.nvidia.com/download/index.aspx?lang=en-us

在这里插入图片描述

2.2 安装win10版本nvidia驱动

在这里插入图片描述

2.3 查看nvidia-cuda

nvidia-smi

在这里插入图片描述

不要在 wsl 中安装任何 linux 显卡驱动程序

https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2

2.4 ubuntu 生产环境掉驱动问题 failed to initialize nvml: driver/library version mismatch

2.4.1 nvidia-smi

生产环境:v100x4
系统版本:ubuntu 22.04
凌晨还在用watch显示使用状态

+---------------------------------------------------------------------------------------+
| nvidia-smi 535.104.05             driver version: 535.104.05   cuda version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| gpu  name                 persistence-m | bus-id        disp.a | volatile uncorr. ecc |
| fan  temp   perf          pwr:usage/cap |         memory-usage | gpu-util  compute m. |
|                                         |                      |               mig m. |
|=========================================+======================+======================|
|   0  tesla v100-sxm2-16gb           off | 00000000:00:08.0 off |                    0 |
| n/a   47c    p0             184w / 300w |   6945mib / 16384mib |     75%      default |
|                                         |                      |                  n/a |
+-----------------------------------------+----------------------+----------------------+
|   1  tesla v100-sxm2-16gb           off | 00000000:00:09.0 off |                    0 |
| n/a   45c    p0             249w / 300w |   7863mib / 16384mib |     91%      default |
|                                         |                      |                  n/a |
+-----------------------------------------+----------------------+----------------------+
|   2  tesla v100-sxm2-16gb           off | 00000000:00:0a.0 off |                    0 |
| n/a   45c    p0             194w / 300w |   7983mib / 16384mib |     75%      default |
|                                         |                      |                  n/a |
+-----------------------------------------+----------------------+----------------------+
|   3  tesla v100-sxm2-16gb           off | 00000000:00:0b.0 off |                    0 |
| n/a   35c    p0              41w / 300w |      0mib / 16384mib |      0%      default |
|                                         |                      |                  n/a |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| processes:                                                                            |
|  gpu   gi   ci        pid   type   process name                            gpu memory |
|        id   id                                                             usage      |
|=======================================================================================|
|    0   n/a  n/a   1548534      c   python                                     6942mib |
|    1   n/a  n/a   1548535      c   python                                     7860mib |
|    2   n/a  n/a   1548536      c   python                                     7980mib |
+---------------------------------------------------------------------------------------+

中午就发现这样了

$ nvidia-smi
failed to initialize nvml: driver/library version mismatch
nvml library version: 535.104

不管是nvtop还是nvitop还是gpustat都不管用

2.4.2 查看一番

  • 查看硬件
$ lspci | grep -i nvidia
00:08.0 3d controller: nvidia corporation gv100gl [tesla v100 sxm2 16gb] (rev a1)
00:09.0 3d controller: nvidia corporation gv100gl [tesla v100 sxm2 16gb] (rev a1)
00:0a.0 3d controller: nvidia corporation gv100gl [tesla v100 sxm2 16gb] (rev a1)
00:0b.0 3d controller: nvidia corporation gv100gl [tesla v100 sxm2 16gb] (rev a1)
  • 查看内核版本
$ cat /proc/driver/nvidia/version
nvrm version: nvidia unix x86_64 kernel module  535.86.10  wed jul 26 23:20:03 utc 2023
gcc version:  gcc version 11.3.0 (ubuntu 11.3.0-1ubuntu1~22.04.1) 
  • 查看显卡驱动
$ dpkg -l | grep nvidia
ii  gpustat                               0.6.0-1                                     all          pretty nvidia device monitor
iu  libnvidia-cfg1-535:amd64              535.104.05-0ubuntu0.22.04.4                 amd64        nvidia binary opengl/glx configuration library
ii  libnvidia-common-535                  535.86.10-0ubuntu1                          all          shared files used by the nvidia libraries
iu  libnvidia-compute-535:amd64           535.104.05-0ubuntu0.22.04.4                 amd64        nvidia libcompute package
iu  libnvidia-decode-535:amd64            535.104.05-0ubuntu0.22.04.4                 amd64        nvidia video decoding runtime libraries
iu  libnvidia-encode-535:amd64            535.104.05-0ubuntu0.22.04.4                 amd64        nvenc video encoding runtime libraryiu  libnvidia-extra-535:amd64             535.104.05-0ubuntu0.22.04.4                 amd64        extra libraries for the nvidia driver
iu  libnvidia-fbc1-535:amd64              535.104.05-0ubuntu0.22.04.4                 amd64        nvidia opengl-based framebuffer capture runtime library
ii  libnvidia-gl-535:amd64                535.86.10-0ubuntu1                          amd64        nvidia opengl/glx/egl/gles glvnd libraries and vulkan icd
iu  nvidia-compute-utils-535              535.104.05-0ubuntu0.22.04.4                 amd64        nvidia compute utilities
iu  nvidia-dkms-535                       535.104.05-0ubuntu0.22.04.4                 amd64        nvidia dkms package
iu  nvidia-driver-535                     535.104.05-0ubuntu0.22.04.4                 amd64        nvidia driver metapackage
iu  nvidia-firmware-535-535.104.05        535.104.05-0ubuntu0.22.04.4                 amd64        firmware files used by the kernel module
ii  nvidia-kernel-common-535              535.86.10-0ubuntu1                          amd64        shared files used with the kernel module
iu  nvidia-kernel-source-535              535.104.05-0ubuntu0.22.04.4                 amd64        nvidia kernel source package
ii  nvidia-modprobe                       535.86.10-0ubuntu1                          amd64        load the nvidia kernel driver and create device files
ii  nvidia-prime                          0.8.17.1                                    all          tools to enable nvidia's prime
ii  nvidia-settings                       535.86.10-0ubuntu1                          amd64        tool for configuring the nvidia graphics driver
iu  nvidia-utils-535                      535.104.05-0ubuntu0.22.04.4                 amd64        nvidia driver support binaries
ii  screen-resolution-extra               0.18.2                                      all          extension for the nvidia-settings control panel
iu  xserver-xorg-video-nvidia-535         535.104.05-0ubuntu0.22.04.4                 amd64        nvidia binary xorg driver
  • 查看驱动日志
$ cat /proc/driver/nvidia/version
2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 upgrade libnvidia-gl-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 upgrade nvidia-dkms-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 upgrade nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-configured nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 install nvidia-firmware-535-535.104.05:amd64 <none> 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-installed nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status unpacked nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 upgrade libnvidia-decode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade libnvidia-compute-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-compute-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-extra-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-encode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.104.05-0ubuntu0.22.04.4

2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
原来是偷偷升级了535.86.10 -> 535.104.05,nvidia 内核驱动版本与系统驱动不一致

2.4.2 停止nvidia更新 以免生产环境突然掉驱动

sudo apt-mark hold nvidia-driver-版本

$ sudo apt-mark hold  nvidia-driver-535
nvidia-driver-535 set on hold.

2.4.3 关闭所有软件包自动更新

考虑生产环境保持软件和环境稳定,关闭软件包自动更新
sudo dpkg-reconfigure unattended-upgrades

$ sudo dpkg-reconfigure unattended-upgrades
replacing config file /etc/apt/apt.conf.d/20auto-upgrades with new version

在这里插入图片描述选择no,不同意自动下载并安装稳定版软件升级

3、cuda toolkit (系统级-各环境共享)

3.1 cuda toolkit 官网

https://developer.nvidia.com/cuda-downloads?target_os=linux&target_arch=x86_64&distribution=wsl-ubuntu&target_version=2.0&target_type=deb_local

在这里插入图片描述

历史版本
https://developer.nvidia.com/cuda-toolkit-archive

wsl 上的 cuda 用户指南
https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2

3.2基本安装

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

3.3 gpg key报错

w: gpg error: file:/var/cuda-repo-wsl-ubuntu-12-1-local  inrelease: the following signatures couldn't be verified because the public key is not available: no_pubkey cdd5140ff7b46061
e: the repository 'file:/var/cuda-repo-wsl-ubuntu-12-1-local  inrelease' is not signed.
n: updating from such a repository can't be done securely, and is therefore disabled by default.
n: see apt-secure(8) manpage for repository creation and user configuration details.

在这里插入图片描述

删除gpg key
sudo apt-key del 7fa2af80
安装gpg key
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-f7b46061-keyring.gpg /usr/share/keyrings/
在这里插入图片描述

3.4 查看cuda状态

nvcc -v

3.5 command ‘nvcc’ not found

编辑路径配置
vim ~/.bashrc
加入系统路径

export ld_library_path=ld_library_path:/usr/local/cuda/lib64
export path=$path:/usr/local/cuda/bin
export cuda_home=$cuda_home:/usr/local/cuda

echo 'export ld_library_path="$ld_library_path:/usr/local/cuda/lib64"' >> ~/.bashrc
echo 'export path="$path:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export cuda_home="$cuda_home:/usr/local/cuda"'>> ~/.bashrc

更新配置
source ~/.bashrc

3.6 关于官方cuda版本与虚拟环境cudatoolkit版本的关系与区别

3.6.1 安装方法不同

3.6.2 实现不同版本的cuda开发环境

4、 cudnn gpu加速的深度神经网络原语库 (系统级-各环境共享)

4.1官网

https://developer.nvidia.com/rdp/cudnn-archive
需要注册账号登录下载
在这里插入图片描述

4.2 通过ssh传送cuddn安装包到wsl

在这里插入图片描述wsl2安装ssh服务请参考

4.3 安装zliblg

sudo apt-get install zlib1g

(base) fb@vp01:~/download$ conda activate modelscope
(modelscope) fb@vp01:~/download$ sudo apt-get install zlib1g
[sudo] password for fb:
reading package lists... done
building dependency tree... done
reading state information... done
zlib1g is already the newest version (1:1.2.11.dfsg-2ubuntu9.2).
zlib1g set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 53 not upgraded.

4.4 安装cuddn

4.4.1 启用本地存储库

sudo dpkg -i cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb

[sudo] password for fb:
selecting previously unselected package cudnn-local-repo-ubuntu2204-8.8.1.3.
(reading database ... 40179 files and directories currently installed.)
preparing to unpack cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb ...
unpacking cudnn-local-repo-ubuntu2204-8.8.1.3 (1.0-1) ...
setting up cudnn-local-repo-ubuntu2204-8.8.1.3 (1.0-1) ...

the public cudnn-local-repo-ubuntu2204-8.8.1.3 gpg key does not appear to be installed.
to install the key, run this command:
sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-db35eeee-keyring.gpg /usr/share/keyrings/

4.4.2 导入 cuda gpg 密钥

sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-db35eeee-keyring.gpg /usr/share/keyrings/
注意: key的导入命令从上步骤最后一行获取

4.4.3 刷新存储库元数据

sudo apt-get update

4.4.4 安装运行时库

sudo apt-get install libcudnn8=8.8.1.3-1+cuda12.1

reading package lists... done
building dependency tree... done
reading state information... done
package libcudnn8 is not available, but is referred to by another package.
this may mean that the package is missing, has been obsoleted, or
is only available from another source

e: version '8.8.1.3-1+cuda12.1' for 'libcudnn8' was not found

ll /var/cudnn-local-repo-ubuntu2204-8.8.1.3/

(modelscope) fb@vp01:~/download$ ll /var
total 68
drwxr-xr-x 15 root root    4096 apr  7 00:43 ./
drwxr-xr-x 19 root root    4096 apr  6 22:50 ../
drwxr-xr-x  2 root root    4096 apr 18  2022 backups/
drwxr-xr-x 11 root root    4096 apr  6 23:06 cache/
drwxrwxrwt  2 root root    4096 feb 11 05:36 crash/
drwxr-xr-x  2 root root   12288 apr  5 11:52 cuda-repo-wsl-ubuntu-12-1-local/
drwxr-xr-x  2 root root    4096 apr  7 00:43 cudnn-local-repo-ubuntu2204-8.8.1.3/
drwxr-xr-x 28 root root    4096 feb 11 05:36 lib/
drwxrwsr-x  2 root staff   4096 apr 18  2022 local/
lrwxrwxrwx  1 root root       9 feb 11 05:35 lock -> /run/lock/
drwxrwxr-x  7 root syslog  4096 apr  5 11:08 log/
drwxrwsr-x  2 root backup  4096 feb 11 05:35 mail/
drwxr-xr-x  2 root root    4096 feb 11 05:35 opt/
lrwxrwxrwx  1 root root       4 feb 11 05:35 run -> /run/
drwxr-xr-x  7 root root    4096 feb 11 05:36 snap/
drwxr-xr-x  4 root root    4096 feb 11 05:35 spool/
drwxrwxrwt  2 root root    4096 apr  5 22:48 tmp/
(modelscope) fb@vp01:~/download$ ll /var/cudnn-local-repo-ubuntu2204-8.8.1.3/
total 872792
drwxr-xr-x  2 root root      4096 apr  7 00:43 ./
drwxr-xr-x 15 root root      4096 apr  7 00:43 ../
-rw-r--r--  1 root root      1662 mar  2 04:21 db35eeee.pub
-rw-r--r--  1 root root      1575 mar  2 04:21 inrelease
-rw-r--r--  1 root root      1930 mar  2 04:21 local.md5
-rw-r--r--  1 root root       836 mar  2 04:21 local.md5.gpg
-rw-r--r--  1 root root      2114 mar  2 04:21 packages
-rw-r--r--  1 root root       947 mar  2 04:21 packages.gz
-rw-r--r--  1 root root       690 mar  2 04:21 release
-rw-r--r--  1 root root       836 mar  2 04:21 release.gpg
-rw-r--r--  1 root root      1141 mar  2 04:21 cudnn-local-db35eeee-keyring.gpg
-rw-r--r--  1 root root 440032208 mar  2 04:21 libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
-rw-r--r--  1 root root   1664314 mar  2 04:21 libcudnn8-samples_8.8.1.3-1+cuda12.0_amd64.deb
-rw-r--r--  1 root root 451984894 mar  2 04:21 libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb

找到正确的包名,完美解决 ‘libcudnn8’ was not found’
sudo apt-get install libcudnn8=8.8.1.3-1+cuda12.0

reading package lists... done
building dependency tree... done
reading state information... done
the following new packages will be installed:
  libcudnn8
0 upgraded, 1 newly installed, 0 to remove and 53 not upgraded.
need to get 0 b/452 mb of archives.
after this operation, 1152 mb of additional disk space will be used.
get:1 file:/var/cudnn-local-repo-ubuntu2204-8.8.1.3  libcudnn8 8.8.1.3-1+cuda12.0 [452 mb]
selecting previously unselected package libcudnn8.
(reading database ... 40195 files and directories currently installed.)
preparing to unpack .../libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb ...
unpacking libcudnn8 (8.8.1.3-1+cuda12.0) ...
setting up libcudnn8 (8.8.1.3-1+cuda12.0) ...

4.4.5 安装开发者库

sudo apt-get install libcudnn8-dev=8.8.1.3-1+cuda12.0

4.4.6 安装代码示例和cudnn 库文档

sudo apt-get install libcudnn8-samples=8.8.1.3-1+cuda12.0

4.5 验证cudnn

cp -r /usr/src/cudnn_samples_v8/ $home
cd  $home/cudnn_samples_v8/mnistcudnn
make clean && make
./mnistcudnn

4.5.1 test.c:1:10: fatal error: freeimage.h: no such file or directory

rm -rf *o
rm -rf mnistcudnn
cuda_version is 12010
linking agains cublaslt = true
cuda version: 12010
target arch: x86_64
host_arch: x86_64
target os: linux
sms: 50 53 60 61 62 70 72 75 80 86 87 90
test.c:1:10: fatal error: freeimage.h: no such file or directory
    1 | #include "freeimage.h"
      |          ^~~~~~~~~~~~~
compilation terminated.

sudo apt-get install libfreeimage3 libfreeimage-dev

4.5.2 nvcc fatal : unsupported gpu architecture ‘compute_35’ 算力不支持

rm -rf *o
rm -rf mnistcudnn
cuda_version is 12030
linking agains cublaslt = true
cuda version: 12030
target arch: x86_64
host_arch: x86_64
target os: linux
sms: 35 50 53 60 61 62 70 72 75 80 86 87
/usr/local/cuda/bin/nvcc -i/usr/local/cuda/include -i/usr/local/cuda/include -ifreeimage/include  -ccbin g++ -m64    -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o fp16_dev.o -c fp16_dev.cu
nvcc fatal   : unsupported gpu architecture 'compute_35'
make: *** [makefile:241: fp16_dev.o] error 1

编辑makefile 禁用35
sudo vi makefile

(0)

相关文章:

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。

发表评论

验证码:
Copyright © 2017-2025  代码网 保留所有权利. 粤ICP备2024248653号
站长QQ:2386932994 | 联系邮箱:2386932994@qq.com