当前位置: 代码网 > it编程>编程语言>C/C++ > 【备忘】Fastdeploy编译中遇到nvcc fatal : Unsupported gpu architecture ‘compute_35‘的解决

【备忘】Fastdeploy编译中遇到nvcc fatal : Unsupported gpu architecture ‘compute_35‘的解决

2024年08月01日 C/C++ 我要评论
Paddle C++ SDK编译错误的解决

fastdeploy编译中遇到nvcc fatal : unsupported gpu architecture ‘compute_35‘的解决

背景

使用paddle的fastdeploy,安装过程中需要对c++ sdk进行编译,编译过程中报标题所述的错误。后来在github上找到了解决办法

环境

  • gpu: rtx3060ti
  • ubuntu 2204
  • cuda 12.1.1
  • tensorrt-8.6.1.6
  • opencv 4.7
  • fastdeploy develop,commit id = cd0ee79c91d4ed1103abdc65ff12ccadd23d0827

复现路径

  1. 安装cuda-12.1.1(官网下载步骤及链接
  2. 安装opencv, 到github官网git clone下来,手动编译,资料csdn很多就不贴了。
  3. 安装tensorrt。按照paddle官网要求,cuda 工具包 12.0 配合 cudnn v8.9.1, 如需使用 paddletensorrt 推理,需配合 tensorrt8.6.1.6(官网链接提供了,tar包的,解压后设置一下路径就可以了,但下载需要nvidia developer帐号登录,免费注册)
  4. 安装fastdeploy,我按照的是这个教程,其中以下cmake选项有几处问题需要手动修改。
git clone https://github.com/paddlepaddle/fastdeploy.git
cd fastdeploy
mkdir build && cd build
cmake .. -denable_ort_backend=on \
         -denable_paddle_backend=on \
         -denable_openvino_backend=on \
         -denable_trt_backend=on \
         -dwith_gpu=on \
         -dtrt_directory=/paddle/tensorrt-8.4.1.5 \  # tensorrt的路径要根据你刚才解压的tar包进行修改
         -dcuda_directory=/usr/local/cuda \
         -dcmake_install_prefix=${pwd}/compiled_fastdeploy_sdk \
         -denable_vision=on \
         -dopencv_directory=/usr/lib/x86_64-linux-gnu/cmake/opencv4 \  # 如果你是源码编译再make install的话,就不用改
         -denable_text=on
make -j12
make install
  • 注意点1 cmake的选项需要调整,具体见上面。
  • 注意点2,个人认为就是fastdeploy的问题,问题如下,留意那一堆的nvcc fatal。其中的compute_35其实就是老的计算架构sm_35,我的显卡是sm_86,不应该出现此问题。
make -j16
[  3%] built target extern_onnxruntime
[  6%] built target extern_paddle_inference
[  8%] built target extern_fast_tokenizer
[ 10%] built target extern_paddle2onnx
[ 21%] built target yaml-cpp
[ 21%] built target yaml-cpp-parse
[ 22%] built target yaml-cpp-read
[ 23%] built target yaml-cpp-sandbox
consolidate compiler generated dependencies of target fastdeploy
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o
[ 24%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o
[ 24%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o
[ 25%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/preprocessor.cc.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/resnet.cc.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/yolov5cls.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/model.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/postprocessor.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/preprocessor.cc.o
[ 27%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_postprocessor.cc.o
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec.cc.o
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/postprocessor.cc.o
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
nvcc fatal   : unsupported gpu architecture 'compute_35'
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_preprocessor.cc.o
nvcc fatal   : unsupported gpu architecture 'compute_35'
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:496:cmakefiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o] 错误 1
make[2]: *** 正在等待未完成的任务....
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:510:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:706:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:734:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:720:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:692:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o] 错误 1
make[1]: *** [cmakefiles/makefile2:310:cmakefiles/fastdeploy.dir/all] 错误 2
make: *** [makefile:156:all] 错误 2

问题的解决

解决的方法也很简单,修改fastdeploy/cmake/cuda.cmake文件即可。

if(not with_gpu)
  return()
endif()

# this is to eliminate the cmp0104 warnings from cmake 3.18+.
# instead of setting cuda_architectures, we will set cmake_cuda_flags.
set(cmake_cuda_architectures off)

if(build_on_jetson)
  set(fd_known_gpu_archs "53 62 72")
  set(fd_known_gpu_archs10 "53 62 72")
else()
  message("using new release strategy - all arches packge")
#  set(fd_known_gpu_archs "35 50 52 60 61 70 75 80 86") #原来
#  set(fd_known_gpu_archs10 "35 50 52 60 61 70 75")		#原来

  set(fd_known_gpu_archs "50 52 60 61 70 75 80 86")  #修改
  set(fd_known_gpu_archs10 "50 52 60 61 70 75")		 #修改

  set(fd_known_gpu_archs11 "50 60 61 70 75 80")
endif()

######################################################################################
# a function for automatic detection of gpus installed  (if autodetection is enabled)
# usage:
#   detect_installed_gpus(out_variable)

文件开头包含 “fd_known_gpu_archs”“fd_known_gpu_archs10” 两个地方,删除35后,make即可通过。

[100%] linking cuda device code cmakefiles/fastdeploy.dir/cmake_device_link.o
[100%] linking cxx shared library libfastdeploy.so
[100%] built target fastdeploy
[100%] built target patchelf_paddle_inference
(0)

相关文章:

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。

发表评论

验证码:
Copyright © 2017-2025  代码网 保留所有权利. 粤ICP备2024248653号
站长QQ:2386932994 | 联系邮箱:2386932994@qq.com