linux 36.3 + jetpack v6.0@jetson-inference之目标检测
1. 源由
从应用角度来说,目标检测是计算机视觉里面第二个重要环节。之前的识别示例输出了表示整个输入图像的类别概率。接下来,将专注于目标检测,通过提取边界框来找到帧中各种目标的位置。与图像分类不同,目标检测网络能够在每帧中检测到多个不同的目标。
2. detectnet
detectnet
对象接受图像作为输入,并输出检测到的边界框坐标列表以及它们的类别和置信度值。detectnet
可以在python和c++中使用。请参阅下面可供下载的各种预训练检测模型。默认使用的模型是基于ms coco数据集训练的91类ssd-mobilenet-v2模型,该模型在jetson上结合tensorrt实现了实时推理性能。
2.1 命令选项
$ detectnet --help
usage: detectnet [--help] [--network=network] [--threshold=threshold] ...
input [output]
locate objects in a video/image stream using an object detection dnn.
see below for additional arguments that may not be shown above.
positional arguments:
input resource uri of input stream (see videosource below)
output resource uri of output stream (see videooutput below)
detectnet arguments:
--network=network pre-trained model to load, one of the following:
* ssd-mobilenet-v1
* ssd-mobilenet-v2 (default)
* ssd-inception-v2
* peoplenet
* peoplenet-pruned
* dashcamnet
* trafficcamnet
* facedetect
--model=model path to custom model to load (caffemodel, uff, or onnx)
--prototxt=prototxt path to custom prototxt to load (for .caffemodel only)
--labels=labels path to text file containing the labels for each class
--input-blob=input name of the input layer (default is 'data')
--output-cvg=coverage name of the coverage/confidence output layer (default is 'coverage')
--output-bbox=boxes name of the bounding output layer (default is 'bboxes')
--mean-pixel=pixel mean pixel value to subtract from input (default is 0.0)
--confidence=conf minimum confidence threshold for detection (default is 0.5)
--clustering=cluster minimum overlapping area threshold for clustering (default is 0.75)
--alpha=alpha overlay alpha blending value, range 0-255 (default: 120)
--overlay=overlay detection overlay flags (e.g. --overlay=box,labels,conf)
valid combinations are: 'box', 'lines', 'labels', 'conf', 'none'
--profile enable layer profiling in tensorrt
objecttracker arguments:
--tracking flag to enable default tracker (iou)
--tracker=tracker enable tracking with 'iou' or 'klt'
--tracker-min-frames=n the number of re-identified frames for a track to be considered valid (default: 3)
--tracker-drop-frames=n number of consecutive lost frames before a track is dropped (default: 15)
--tracker-overlap=n how much iou overlap is required for a bounding box to be matched (default: 0.5)
videosource arguments:
input resource uri of the input stream, for example:
* /dev/video0 (v4l2 camera #0)
* csi://0 (mipi csi camera #0)
* rtp://@:1234 (rtp stream)
* rtsp://user:pass@ip:1234 (rtsp stream)
* webrtc://@:1234/my_stream (webrtc stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
--input-width=width explicitly request a width of the stream (optional)
--input-height=height explicitly request a height of the stream (optional)
--input-rate=rate explicitly request a framerate of the stream (optional)
--input-save=file path to video file for saving the input stream to disk
--input-codec=codec rtp requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
--input-decoder=type the decoder engine to use, one of these:
* cpu
* omx (aarch64/jetpack4 only)
* v4l2 (aarch64/jetpack5 only)
--input-flip=flip flip method to apply to input:
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
--input-loop=loop for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don't loop (default)
* >0 = set number of loops
videooutput arguments:
output resource uri of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (rtp stream)
* rtsp://@:8554/my_stream (rtsp stream)
* webrtc://@:1234/my_stream (webrtc stream)
* display://0 (opengl window)
--output-codec=codec desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
--output-encoder=type the encoder engine to use, one of these:
* cpu
* omx (aarch64/jetpack4 only)
* v4l2 (aarch64/jetpack5 only)
--output-save=file path to a video file for saving the compressed stream
to disk, in addition to the primary output above
--bitrate=bitrate desired target vbr bitrate for compressed streams,
in bits per second. the default is 4000000 (4 mbps)
--headless don't create a default opengl gui window
logging arguments:
--log-file=file output destination file (default is stdout)
--log-level=level message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
--verbose enable verbose logging (same as --log-level=verbose)
--debug enable debug logging (same as --log-level=debug)
注:关于照片、视频等基本操作,详见: 《linux 36.3 + jetpack v6.0@jetson-inference之视频操作》
2.2 下载模型
两种方式:
- 创建 对象时,初始化会自动下载
- 通过手动将模型文件放置到
data/networks/
目录下
国内,由于“墙”的存在,对于我们这种处于起飞阶段的菜鸟来说就是“障碍”。有条件的朋友可以参考进行设置网络。
不过,nvidia还是很热心的帮助我们做了“work around”,所有的模型都已经预先存放在中国大陆能访问的位置:github - model-mirror-190618
--network=network pre-trained model to load, one of the following:
* ssd-mobilenet-v1
* ssd-mobilenet-v2 (default)
* ssd-inception-v2
* peoplenet
* peoplenet-pruned
* dashcamnet
* trafficcamnet
* facedetect
--model=model path to custom model to load (caffemodel, uff, or onnx)
根据以上model方面信息,该命令支持:
- ssd-mobilenet-v1
- ssd-mobilenet-v2 (default)
- ssd-inception-v2
- peoplenet
- peoplenet-pruned
- dashcamnet
- trafficcamnet
- facedetect
- 支持定制模型(需要用到通用的模型文件caffemodel, uff, or onnx)
作为示例,就下载一个ssd-mobilenet-v2(default)模型
$ mkdir model-mirror-190618
$ cd model-mirror-190618
$ wget https://github.com/dusty-nv/jetson-inference/releases/download/model-mirror-190618/ssd-mobilenet-v2.tar.gz
$ tar -zxvf ssd-mobilenet-v2.tar.gz -c ../data/networks
$ cd ..
注:这个模型文件下载要注意,将解压缩文件放置到ssd-mobilenet-v2目录下。
2.3 操作示例
$ cd build/aarch64/bin/
2.3.1 单张照片
# c++
$ ./detectnet --network=ssd-mobilenet-v2 images/peds_0.jpg images/test/output_detectnet_cpp.jpg
# python
$ ./detectnet.py --network=ssd-mobilenet-v2 images/peds_0.jpg images/test/output_detectnet_python.jpg
本次cpp和python执行概率结果一致,不像imagenet有差异。
2.3.2 多张照片
# c++
$ ./detectnet "images/peds_*.jpg" images/test/peds_output_detectnet_cpp_%i.jpg
# python
$ ./detectnet.py "images/peds_*.jpg" images/test/peds_output_detectnet_python_%i.jpg
注:多张图片这里就不再放出了,感兴趣的朋友下载代码,本地运行一下即可。
2.3.3 视频
# download test video
wget https://nvidia.box.com/shared/static/veuuimq6pwvd62p9fresqhrrmfqz0e2f.mp4 -o pedestrians.mp4
# c++
$ ./detectnet ../../../pedestrians.mp4 images/test/pedestrians_ssd_detectnet_cpp.mp4
# python
$ ./detectnet.py ../../../pedestrians.mp4 images/test/pedestrians_ssd_detectnet_python.mp4
pedestrians
3. 代码
3.1 python
import statements
├── import sys
├── import argparse
├── from jetson_inference import detectnet
└── from jetson_utils import videosource, videooutput, log
command-line argument parsing
├── create argumentparser
│ ├── description: "locate objects in a live camera stream using an object detection dnn."
│ ├── formatter_class: argparse.rawtexthelpformatter
│ └── epilog: detectnet.usage() + videosource.usage() + videooutput.usage() + log.usage()
├── add arguments
│ ├── input: "uri of the input stream"
│ ├── output: "uri of the output stream"
│ ├── --network: "pre-trained model to load (default: 'ssd-mobilenet-v2')"
│ ├── --overlay: "detection overlay flags (default: 'box,labels,conf')"
│ └── --threshold: "minimum detection threshold to use (default: 0.5)"
└── parse arguments
├── args = parser.parse_known_args()[0]
└── exception handling
├── print("")
└── parser.print_help()
└── sys.exit(0)
create video sources and outputs
├── input = videosource(args.input, argv=sys.argv)
└── output = videooutput(args.output, argv=sys.argv)
load object detection network
└── net = detectnet(args.network, sys.argv, args.threshold)
# note: hard-code paths to load a model (commented out)
├── net = detectnet(model="model/ssd-mobilenet.onnx", labels="model/labels.txt",
├── input_blob="input_0", output_cvg="scores", output_bbox="boxes",
└── threshold=args.threshold)
process frames until eos or user exits
└── while true:
├── capture next image
│ └── img = input.capture()
│ └── if img is none: # timeout
│ └── continue
├── detect objects in the image
│ └── detections = net.detect(img, overlay=args.overlay)
├── print the detections
│ ├── print("detected {:d} objects in image".format(len(detections)))
│ └── for detection in detections:
│ └── print(detection)
├── render the image
│ └── output.render(img)
├── update the title bar
│ └── output.setstatus("{:s} | network {:.0f} fps".format(args.network, net.getnetworkfps()))
├── print performance info
│ └── net.printprofilertimes()
└── exit on input/output eos
├── if not input.isstreaming() or not output.isstreaming():
└── break
3.2 c++
#include statements
├── "videosource.h"
├── "videooutput.h"
├── "detectnet.h"
├── "objecttracker.h"
└── <signal.h>
global variables
└── bool signal_recieved = false;
function definitions
├── void sig_handler(int signo)
│ └── if (signo == sigint)
│ ├── logverbose("received sigint\n");
│ └── signal_recieved = true;
└── int usage()
├── printf("usage: detectnet [--help] [--network=network] [--threshold=threshold] ...\n");
├── printf(" input [output]\n\n");
├── printf("locate objects in a video/image stream using an object detection dnn.\n");
├── printf("see below for additional arguments that may not be shown above.\n\n");
├── printf("positional arguments:\n");
├── printf(" input resource uri of input stream (see videosource below)\n");
├── printf(" output resource uri of output stream (see videooutput below)\n\n");
├── printf("%s", detectnet::usage());
├── printf("%s", objecttracker::usage());
├── printf("%s", videosource::usage());
├── printf("%s", videooutput::usage());
└── printf("%s", log::usage());
main function
├── parse command line
│ ├── commandline cmdline(argc, argv);
│ └── if (cmdline.getflag("help"))
│ └── return usage();
├── attach signal handler
│ └── if (signal(sigint, sig_handler) == sig_err)
│ └── logerror("can't catch sigint\n");
├── create input stream
│ ├── videosource* input = videosource::create(cmdline, arg_position(0));
│ └── if (!input)
│ ├── logerror("detectnet: failed to create input stream\n");
│ └── return 1;
├── create output stream
│ ├── videooutput* output = videooutput::create(cmdline, arg_position(1));
│ └── if (!output)
│ ├── logerror("detectnet: failed to create output stream\n");
│ └── return 1;
├── create detection network
│ ├── detectnet* net = detectnet::create(cmdline);
│ └── if (!net)
│ ├── logerror("detectnet: failed to load detectnet model\n");
│ └── return 1;
│ └── const uint32_t overlayflags = detectnet::overlayflagsfromstr(cmdline.getstring("overlay", "box,labels,conf"));
├── processing loop
│ └── while (!signal_recieved)
│ ├── capture next image
│ │ ├── uchar3* image = null;
│ │ ├── int status = 0;
│ │ ├── if (!input->capture(&image, &status))
│ │ │ └── if (status == videosource::timeout)
│ │ │ └── continue;
│ │ │ └── break; // eos
│ ├── detect objects in the frame
│ │ ├── detectnet::detection* detections = null;
│ │ ├── const int numdetections = net->detect(image, input->getwidth(), input->getheight(), &detections, overlayflags);
│ │ └── if (numdetections > 0)
│ │ └── logverbose("%i objects detected\n", numdetections);
│ │ └── for (int n=0; n < numdetections; n++)
│ │ ├── logverbose("\ndetected obj %i class #%u (%s) confidence=%f\n", n, detections[n].classid, net->getclassdesc(detections[n].classid), detections[n].confidence);
│ │ ├── logverbose("bounding box %i (%.2f, %.2f) (%.2f, %.2f) w=%.2f h=%.2f\n", n, detections[n].left, detections[n].top, detections[n].right, detections[n].bottom, detections[n].width(), detections[n].height());
│ │ └── if (detections[n].trackid >= 0)
│ │ └── logverbose("tracking id %i status=%i frames=%i lost=%i\n", detections[n].trackid, detections[n].trackstatus, detections[n].trackframes, detections[n].tracklost);
│ ├── render outputs
│ │ ├── if (output != null)
│ │ │ ├── output->render(image, input->getwidth(), input->getheight());
│ │ │ ├── char str[256];
│ │ │ ├── sprintf(str, "tensorrt %i.%i.%i | %s | network %.0f fps", nv_tensorrt_major, nv_tensorrt_minor, nv_tensorrt_patch, precisiontypetostr(net->getprecision()), net->getnetworkfps());
│ │ │ ├── output->setstatus(str);
│ │ │ └── if (!output->isstreaming())
│ │ │ └── break;
│ └── print out timing info
│ └── net->printprofilertimes();
├── destroy resources
│ ├── logverbose("detectnet: shutting down...\n");
│ ├── safe_delete(input);
│ ├── safe_delete(output);
│ ├── safe_delete(net);
└── logverbose("detectnet: shutdown complete.\n");
└── return 0;
发表评论