【备忘】Fastdeploy编译中遇到nvcc fatal : Unsupported gpu architecture ‘compute_35‘的解决_C/C++

fastdeploy编译中遇到nvcc fatal : unsupported gpu architecture ‘compute_35‘的解决

背景

使用paddle的fastdeploy，安装过程中需要对c++ sdk进行编译，编译过程中报标题所述的错误。后来在github上找到了解决办法。

环境

gpu: rtx3060ti
ubuntu 2204
cuda 12.1.1
tensorrt-8.6.1.6
opencv 4.7
fastdeploy develop，commit id = cd0ee79c91d4ed1103abdc65ff12ccadd23d0827

复现路径

安装cuda-12.1.1（官网下载步骤及链接）
安装opencv, 到github官网git clone下来，手动编译，资料csdn很多就不贴了。
安装tensorrt。按照paddle官网要求，cuda 工具包 12.0 配合 cudnn v8.9.1, 如需使用 paddletensorrt 推理，需配合 tensorrt8.6.1.6（官网链接提供了，tar包的，解压后设置一下路径就可以了，但下载需要nvidia developer帐号登录，免费注册）
安装fastdeploy，我按照的是这个教程，其中以下cmake选项有几处问题需要手动修改。

git clone https://github.com/paddlepaddle/fastdeploy.git
cd fastdeploy
mkdir build && cd build
cmake .. -denable_ort_backend=on \
         -denable_paddle_backend=on \
         -denable_openvino_backend=on \
         -denable_trt_backend=on \
         -dwith_gpu=on \
         -dtrt_directory=/paddle/tensorrt-8.4.1.5 \  # tensorrt的路径要根据你刚才解压的tar包进行修改
         -dcuda_directory=/usr/local/cuda \
         -dcmake_install_prefix=${pwd}/compiled_fastdeploy_sdk \
         -denable_vision=on \
         -dopencv_directory=/usr/lib/x86_64-linux-gnu/cmake/opencv4 \  # 如果你是源码编译再make install的话，就不用改
         -denable_text=on
make -j12
make install

注意点1 cmake的选项需要调整，具体见上面。
注意点2，个人认为就是fastdeploy的问题，问题如下，留意那一堆的nvcc fatal。其中的compute_35其实就是老的计算架构sm_35，我的显卡是sm_86，不应该出现此问题。

make -j16
[  3%] built target extern_onnxruntime
[  6%] built target extern_paddle_inference
[  8%] built target extern_fast_tokenizer
[ 10%] built target extern_paddle2onnx
[ 21%] built target yaml-cpp
[ 21%] built target yaml-cpp-parse
[ 22%] built target yaml-cpp-read
[ 23%] built target yaml-cpp-sandbox
consolidate compiler generated dependencies of target fastdeploy
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o
[ 24%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o
[ 24%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o
[ 25%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/preprocessor.cc.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/resnet.cc.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/yolov5cls.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/model.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/postprocessor.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/preprocessor.cc.o
[ 27%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_postprocessor.cc.o
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec.cc.o
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/postprocessor.cc.o
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_preprocessor.cc.o
nvcc fatal   : unsupported gpu architecture 'compute_35'
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:496：cmakefiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o] 错误 1
make[2]: *** 正在等待未完成的任务....
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:510：cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:706：cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:734：cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:720：cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:692：cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o] 错误 1
make[1]: *** [cmakefiles/makefile2:310：cmakefiles/fastdeploy.dir/all] 错误 2
make: *** [makefile:156：all] 错误 2

问题的解决

解决的方法也很简单，修改fastdeploy/cmake/cuda.cmake文件即可。

if(not with_gpu)
  return()
endif()

# this is to eliminate the cmp0104 warnings from cmake 3.18+.
# instead of setting cuda_architectures, we will set cmake_cuda_flags.
set(cmake_cuda_architectures off)

if(build_on_jetson)
  set(fd_known_gpu_archs "53 62 72")
  set(fd_known_gpu_archs10 "53 62 72")
else()
  message("using new release strategy - all arches packge")
#  set(fd_known_gpu_archs "35 50 52 60 61 70 75 80 86") #原来
#  set(fd_known_gpu_archs10 "35 50 52 60 61 70 75")		#原来

  set(fd_known_gpu_archs "50 52 60 61 70 75 80 86")  #修改
  set(fd_known_gpu_archs10 "50 52 60 61 70 75")		 #修改

  set(fd_known_gpu_archs11 "50 60 61 70 75 80")
endif()

######################################################################################
# a function for automatic detection of gpus installed  (if autodetection is enabled)
# usage:
#   detect_installed_gpus(out_variable)

文件开头包含 “fd_known_gpu_archs” 和 “fd_known_gpu_archs10” 两个地方，删除35后，make即可通过。

[100%] linking cuda device code cmakefiles/fastdeploy.dir/cmake_device_link.o
[100%] linking cxx shared library libfastdeploy.so
[100%] built target fastdeploy
[100%] built target patchelf_paddle_inference

【备忘】Fastdeploy编译中遇到nvcc fatal : Unsupported gpu architecture ‘compute_35‘的解决

2024年08月01日 • C/C++ •我要评论

fastdeploy编译中遇到nvcc fatal : unsupported gpu architecture ‘compute_35‘的解决

背景

环境

复现路径

问题的解决

相关文章:

Pycharm 添加扩展工具，快速打开Qt 设计师（designer.exe)、Qt 语言家（linguist.exe)，.ui转.py、.qrc转.py

vscode配置C/C++环境（超详细保姆级教学）

vs2019新建Qt工程中双击 .ui 文件无法打开

发表评论


验证码：