目的 针对复杂战场环境下空地异构装备(无人机-无人车)协同侦察中多视角影像行人目标匹配的难题,提出一种基于后融合策略的空地跨视角目标匹配算法,以解决大视角差异(>60°)、尺度剧烈变化等挑战。方法 首先采用轻量化双分支YOLOv10模型实现空地视角影像的高效行人检测;其次,融合多尺度特征提取网络(残差网络ResNet-18结合空间金字塔)与几何定位信息,构建目标的空间-表观联合表征;最后,通过匈牙利算法优化特征与几何约束的加权代价函数,实现跨视角目标的最优匹配。结果 在跨视角多人跟踪数据集CVMHT上的实验表明,该方法平均精确率和召回率分别达到81.4%与79.0%,较未融合表观特征信息的基线方法(76.3%和78.8%)分别提升了5.1%和0.2%,且显著优于传统行人重识别方法ByteTrack V2(微调后33.8%和36.1%)。结论 所提算法通过后融合策略,有效结合检测、几何与表观特征,克服了前融合方法对固定视角布局的依赖,为空地异构装备协同侦察提供了灵活、鲁棒的目标匹配解决方案。
Abstract
To address the challenge of cross-view pedestrian target matching in multi-perspective images for collaborative reconnaissance of aerial-ground heterogeneous equipment (drone-vehicle) under complex battlefield environments, the work aims to propose a post-fusion strategy-based cross-view aerial-ground target matching algorithm, so as to resolve significant viewpoint differences (>60°), and drastic scale variations. Firstly, a lightweight dual-branch YOLOv10 model was employed to achieve efficient pedestrian detection in aerial and ground view images. Secondly, a multi-scale feature extraction network (ResNet-18 integrated with adaptive spatial pyramid) and geometric localization information were fused to construct a spatial-appearance joint representation of targets. Finally, the Hungarian algorithm was adopted to optimize a weighted cost function combining features and geometric constraints to realize optimal cross-view target matching. Experiments on the CVMHT dataset demonstrated that the proposed method achieved average precision and recall rates of 81.4% and 79.0%, respectively, outperforming baseline methods without fused appearance features (76.3% and 78.8%) and significantly surpassing traditional person re-identification methods like ByteTrack V2 (33.8% and 36.1% after fine-tuning). The proposed algorithm effectively integrates detection, geometry, and appearance features through a post-fusion strategy, overcoming the dependency of pre-fusion methods on fixed-view layouts, thereby providing a flexible and robust target matching solution for collaborative reconnaissance of aerial-ground heterogeneous equipment.
关键词
空地跨视角目标匹配 /
后融合策略 /
YOLOv10 /
多尺度特征融合 /
几何定位 /
匈牙利算法
Key words
cross-view aerial-ground target matching /
post-fusion strategy /
YOLOv10 /
multi-scale feature fusion /
geometric localization /
Hungarian algorithm
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] ZHUANG Z W, LI R, JIA K, et al.Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021.
[2] LIANG M, YANG B, CHEN Y, et al.Multi-Task Multi- Sensor Fusion for 3D Object Detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019.
[3] MITCHELL H B.Multi-Sensor Data Fusion: An Introduction[M]. Berlin: Springer Science & Business Media, 2007.
[4] WANG A, CHEN H, LIU L, et al.YOLOv10: Real-Time End-to-End Object Detection[J]. Advances in Neural Information Processing Systems, 2024, 37: 107984-108011.
[5] ZHENG L, YANG Y, HAUPTMANN A G. Person Re-Identification: Past, Present and Future[EB/OL].(2016-10-10)[2025-04-30]. https://arxiv.org/abs/1610.02984.
[6] NING E H, WANG C S, ZHANG H, et al.Occluded Person re-Identification with Deep Learning: A Survey and Perspectives[J]. Expert Systems with Applications, 2024, 239: 122419.
[7] NING E H, WANG Y F, WANG C S, et al.Enhancement, Integration, Expansion: Activating Representation of Detailed Features for Occluded Person re-Identification[J]. Neural Networks, 2024, 169: 532-541.
[8] TIAN Y C, CHEN C, SHAH M.Cross-View Image Matching for Geo-Localization in Urban Environments[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017.
[9] ARANDJELOVIĆ R, GRONAT P, TORII A, et al.NetVLAD: CNN Architecture for Weakly Supervised Place Recognition[C]// Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence. [s. l.]: IEEE, 2018.
[10] LIN T Y, CUI Y, BELONGIE S, et al.Learning Deep Representations for Ground-to-Aerial Geolocalization[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015.
[11] ZHU S J, YANG T, CHEN C.VIGOR: Cross-View Image Geo-Localization beyond One-to-One Retrieval[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021.
[12] TOKER A, ZHOU Q J, MAXIMOV M, et al.Coming down to Earth: Satellite-to-Street View Synthesis for Geo-Localization[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021.
[13] 饶子昱, 卢俊, 郭海涛, 等. 利用视角转换的跨视角影像匹配方法[J]. 地球信息科学学报, 2023, 25(2): 368-379.
RAO Z Y, LU J, GUO H T, et al.A Cross-View Image Matching Method with Viewpoint Conversion[J]. Journal of Geo-Information Science, 2023, 25(2): 368-379.
[14] BEWLEY A, GE Z, OTT L, et al.Simple Online and Realtime Tracking[C]// Proceedings of 2016 IEEE International Conference on Image Processing (ICIP). [s. l.]: IEEE, 2016.
[15] WOJKE N, BEWLEY A, PAULUS D.Simple Online and Realtime Tracking with a Deep Association Metric[C]// Proceedings of 2017 IEEE International Conference on Image Processing (ICIP). Beijing: IEEE, 2017.
[16] ZHANG Y F, SUN P Z, JIANG Y, et al.ByteTrack: Multi-Object Tracking by Associating every Detection Box[C]// Computer Vision-ECCV 2022. Cham: Springer Nature Switzerland, 2022.
[17] SUN Y F, ZHENG L, YANG Y, et al.Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)[C]// Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018.
[18] HE S T, LUO H, WANG P C, et al.TransReID: Transformer-Based Object re-Identification[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021.
[19] HOU Y Z, ZHENG L, GOULD S.Multiview Detection with Feature Perspective Transformation[C]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020.
[20] HOU Y Z, ZHENG L.Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)[C]// Proceedings of the 29th ACM International Conference on Multimedia. [s. l.]: ACM, 2021.
[21] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is All You Need[C]//Advances in Neural Information Processing Systems.[s.l.]: [s.n.], 2017.
[22] YUN S, HAN D, CHUN S, et al.CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE, 2019: 6022-6031.
[23] HE K M, ZHANG X Y, REN S Q, et al.Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016.
[24] HAN R, ZHANG Y, FENG W, et al. Multiple Human Association Between Top and Horizontal Views by Matching Subjects' Spatial Distributions[EB/OL]. (2019-07-26)[2025-04-30]. https://arxiv.org/abs/1907.11458.
[25] MILLS-TETTEY G A, STENTZ A, DIAS M B. The Dynamic Hungarian Algorithm for the Assignment Problem With Changing Costs[M]. Pittsburgh: Robotics Institute, Carnegie Mellon University, 2007: 5-18.
[26] GAN Y Y, HAN R Z, YIN L Q, et al.Self-Supervised Multi-View Multi-Human Association and Tracking[C]// Proceedings of the 29th ACM International Conference on Multimedia. [s. l.]: ACM, 2021.
[27] ZHANG Y, WANG X, YE X, et al. ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box[EB/OL]. (2023-03-27)[2025-04-30]. https://arxiv.org/abs/2303.15334.
基金
国防科技重点实验室稳定支持项目(JCKY2024209C001); 中央高校基本科研业务费专项资金(2025300207)