背景
使用paddle的fastdeploy,安装过程中需要对c++ sdk进行编译,编译过程中报标题所述的错误。后来在github上找到了解决办法。
环境
- gpu: rtx3060ti
- ubuntu 2204
- cuda 12.1.1
- tensorrt-8.6.1.6
- opencv 4.7
- fastdeploy develop,commit id = cd0ee79c91d4ed1103abdc65ff12ccadd23d0827
复现路径
- 安装cuda-12.1.1(官网下载步骤及链接)
- 安装opencv, 到github官网git clone下来,手动编译,资料csdn很多就不贴了。
- 安装tensorrt。按照paddle官网要求,cuda 工具包 12.0 配合 cudnn v8.9.1, 如需使用 paddletensorrt 推理,需配合 tensorrt8.6.1.6(官网链接提供了,tar包的,解压后设置一下路径就可以了,但下载需要nvidia developer帐号登录,免费注册)
- 安装fastdeploy,我按照的是这个教程,其中以下cmake选项有几处问题需要手动修改。
git clone https://github.com/paddlepaddle/fastdeploy.git
cd fastdeploy
mkdir build && cd build
cmake .. -denable_ort_backend=on \
-denable_paddle_backend=on \
-denable_openvino_backend=on \
-denable_trt_backend=on \
-dwith_gpu=on \
-dtrt_directory=/paddle/tensorrt-8.4.1.5 \ # tensorrt的路径要根据你刚才解压的tar包进行修改
-dcuda_directory=/usr/local/cuda \
-dcmake_install_prefix=${pwd}/compiled_fastdeploy_sdk \
-denable_vision=on \
-dopencv_directory=/usr/lib/x86_64-linux-gnu/cmake/opencv4 \ # 如果你是源码编译再make install的话,就不用改
-denable_text=on
make -j12
make install
- 注意点1 cmake的选项需要调整,具体见上面。
- 注意点2,个人认为就是fastdeploy的问题,问题如下,留意那一堆的nvcc fatal。其中的compute_35其实就是老的计算架构sm_35,我的显卡是sm_86,不应该出现此问题。
make -j16
[ 3%] built target extern_onnxruntime
[ 6%] built target extern_paddle_inference
[ 8%] built target extern_fast_tokenizer
[ 10%] built target extern_paddle2onnx
[ 21%] built target yaml-cpp
[ 21%] built target yaml-cpp-parse
[ 22%] built target yaml-cpp-read
[ 23%] built target yaml-cpp-sandbox
consolidate compiler generated dependencies of target fastdeploy
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o
[ 23%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o
[ 24%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o
[ 24%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o
[ 25%] building cuda object cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/preprocessor.cc.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/resnet.cc.o
[ 25%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/yolov5cls.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/model.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/postprocessor.cc.o
[ 26%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppcls/preprocessor.cc.o
[ 27%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_postprocessor.cc.o
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec.cc.o
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/contrib/yolov5cls/postprocessor.cc.o
nvcc fatal : unsupported gpu architecture 'compute_35'
nvcc fatal : unsupported gpu architecture 'compute_35'
nvcc fatal : unsupported gpu architecture 'compute_35'
nvcc fatal : unsupported gpu architecture 'compute_35'
nvcc fatal : unsupported gpu architecture 'compute_35'
[ 28%] building cxx object cmakefiles/fastdeploy.dir/fastdeploy/vision/classification/ppshitu/ppshituv2_rec_preprocessor.cc.o
nvcc fatal : unsupported gpu architecture 'compute_35'
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:496:cmakefiles/fastdeploy.dir/fastdeploy/function/cuda_cast.cu.o] 错误 1
make[2]: *** 正在等待未完成的任务....
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:510:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/common/cuda/adaptive_pool2d_kernel.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:706:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/grid_sample_3d.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:734:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/voxelize_op.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:720:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/iou3d_nms_kernel.cu.o] 错误 1
make[2]: *** [cmakefiles/fastdeploy.dir/build.make:692:cmakefiles/fastdeploy.dir/fastdeploy/runtime/backends/paddle/ops/centerpoint_postprocess_op.cu.o] 错误 1
make[1]: *** [cmakefiles/makefile2:310:cmakefiles/fastdeploy.dir/all] 错误 2
make: *** [makefile:156:all] 错误 2
问题的解决
解决的方法也很简单,修改fastdeploy/cmake/cuda.cmake文件即可。
if(not with_gpu)
return()
endif()
# this is to eliminate the cmp0104 warnings from cmake 3.18+.
# instead of setting cuda_architectures, we will set cmake_cuda_flags.
set(cmake_cuda_architectures off)
if(build_on_jetson)
set(fd_known_gpu_archs "53 62 72")
set(fd_known_gpu_archs10 "53 62 72")
else()
message("using new release strategy - all arches packge")
# set(fd_known_gpu_archs "35 50 52 60 61 70 75 80 86") #原来
# set(fd_known_gpu_archs10 "35 50 52 60 61 70 75") #原来
set(fd_known_gpu_archs "50 52 60 61 70 75 80 86") #修改
set(fd_known_gpu_archs10 "50 52 60 61 70 75") #修改
set(fd_known_gpu_archs11 "50 60 61 70 75 80")
endif()
######################################################################################
# a function for automatic detection of gpus installed (if autodetection is enabled)
# usage:
# detect_installed_gpus(out_variable)
文件开头包含 “fd_known_gpu_archs” 和 “fd_known_gpu_archs10” 两个地方,删除35后,make即可通过。
[100%] linking cuda device code cmakefiles/fastdeploy.dir/cmake_device_link.o
[100%] linking cxx shared library libfastdeploy.so
[100%] built target fastdeploy
[100%] built target patchelf_paddle_inference
发表评论