Post-fusion Strategy for Aerial-ground Cross-view Pedestrian Target Matching

GAO Jun; YANG Han; LIU Yong; HE Xiuwei; TAN Li; YIN Yankun; SHEN Xiaolei; YANG Feifei; PENG Chenglei

doi:10.7643/issn.1672-9242.2025.07.003

PDF(5617 KB)

Equipment Environmental Engineering ›› 2025, Vol. 22 ›› Issue (7) : 16-23. DOI: 10.7643/issn.1672-9242.2025.07.003

Special Topic—Application and Collaborative Evaluation Technology of Light Weapons in Complex Environments

Post-fusion Strategy for Aerial-ground Cross-view Pedestrian Target Matching

GAO Jun¹, YANG Han¹, LIU Yong², HE Xiuwei², TAN Li², YIN Yankun², SHEN Xiaolei², YANG Feifei², PENG Chenglei^1,*

Author information +

History +

Abstract

To address the challenge of cross-view pedestrian target matching in multi-perspective images for collaborative reconnaissance of aerial-ground heterogeneous equipment (drone-vehicle) under complex battlefield environments, the work aims to propose a post-fusion strategy-based cross-view aerial-ground target matching algorithm, so as to resolve significant viewpoint differences (>60°), and drastic scale variations. Firstly, a lightweight dual-branch YOLOv10 model was employed to achieve efficient pedestrian detection in aerial and ground view images. Secondly, a multi-scale feature extraction network (ResNet-18 integrated with adaptive spatial pyramid) and geometric localization information were fused to construct a spatial-appearance joint representation of targets. Finally, the Hungarian algorithm was adopted to optimize a weighted cost function combining features and geometric constraints to realize optimal cross-view target matching. Experiments on the CVMHT dataset demonstrated that the proposed method achieved average precision and recall rates of 81.4% and 79.0%, respectively, outperforming baseline methods without fused appearance features (76.3% and 78.8%) and significantly surpassing traditional person re-identification methods like ByteTrack V2 (33.8% and 36.1% after fine-tuning). The proposed algorithm effectively integrates detection, geometry, and appearance features through a post-fusion strategy, overcoming the dependency of pre-fusion methods on fixed-view layouts, thereby providing a flexible and robust target matching solution for collaborative reconnaissance of aerial-ground heterogeneous equipment.

Key words

cross-view aerial-ground target matching / post-fusion strategy / YOLOv10 / multi-scale feature fusion / geometric localization / Hungarian algorithm

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

GAO Jun, YANG Han, LIU Yong, HE Xiuwei, TAN Li, YIN Yankun, SHEN Xiaolei, YANG Feifei, PENG Chenglei. Post-fusion Strategy for Aerial-ground Cross-view Pedestrian Target Matching[J]. Equipment Environmental Engineering. 2025, 22(7): 16-23 https://doi.org/10.7643/issn.1672-9242.2025.07.003

References

[1] ZHUANG Z W, LI R, JIA K, et al.Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021.
[2] LIANG M, YANG B, CHEN Y, et al.Multi-Task Multi- Sensor Fusion for 3D Object Detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019.
[3] MITCHELL H B.Multi-Sensor Data Fusion: An Introduction[M]. Berlin: Springer Science & Business Media, 2007.
[4] WANG A, CHEN H, LIU L, et al.YOLOv10: Real-Time End-to-End Object Detection[J]. Advances in Neural Information Processing Systems, 2024, 37: 107984-108011.
[5] ZHENG L, YANG Y, HAUPTMANN A G. Person Re-Identification: Past, Present and Future[EB/OL].(2016-10-10)[2025-04-30]. https://arxiv.org/abs/1610.02984.
[6] NING E H, WANG C S, ZHANG H, et al.Occluded Person re-Identification with Deep Learning: A Survey and Perspectives[J]. Expert Systems with Applications, 2024, 239: 122419.
[7] NING E H, WANG Y F, WANG C S, et al.Enhancement, Integration, Expansion: Activating Representation of Detailed Features for Occluded Person re-Identification[J]. Neural Networks, 2024, 169: 532-541.
[8] TIAN Y C, CHEN C, SHAH M.Cross-View Image Matching for Geo-Localization in Urban Environments[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017.
[9] ARANDJELOVIĆ R, GRONAT P, TORII A, et al.NetVLAD: CNN Architecture for Weakly Supervised Place Recognition[C]// Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence. [s. l.]: IEEE, 2018.
[10] LIN T Y, CUI Y, BELONGIE S, et al.Learning Deep Representations for Ground-to-Aerial Geolocalization[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston: IEEE, 2015.
[11] ZHU S J, YANG T, CHEN C.VIGOR: Cross-View Image Geo-Localization beyond One-to-One Retrieval[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021.
[12] TOKER A, ZHOU Q J, MAXIMOV M, et al.Coming down to Earth: Satellite-to-Street View Synthesis for Geo-Localization[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021.
[13] 饶子昱, 卢俊, 郭海涛, 等. 利用视角转换的跨视角影像匹配方法[J]. 地球信息科学学报, 2023, 25(2): 368-379.
RAO Z Y, LU J, GUO H T, et al.A Cross-View Image Matching Method with Viewpoint Conversion[J]. Journal of Geo-Information Science, 2023, 25(2): 368-379.
[14] BEWLEY A, GE Z, OTT L, et al.Simple Online and Realtime Tracking[C]// Proceedings of 2016 IEEE International Conference on Image Processing (ICIP). [s. l.]: IEEE, 2016.
[15] WOJKE N, BEWLEY A, PAULUS D.Simple Online and Realtime Tracking with a Deep Association Metric[C]// Proceedings of 2017 IEEE International Conference on Image Processing (ICIP). Beijing: IEEE, 2017.
[16] ZHANG Y F, SUN P Z, JIANG Y, et al.ByteTrack: Multi-Object Tracking by Associating every Detection Box[C]// Computer Vision-ECCV 2022. Cham: Springer Nature Switzerland, 2022.
[17] SUN Y F, ZHENG L, YANG Y, et al.Beyond Part Models: Person Retrieval with Refined Part Pooling (and a Strong Convolutional Baseline)[C]// Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018.
[18] HE S T, LUO H, WANG P C, et al.TransReID: Transformer-Based Object re-Identification[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Montreal: IEEE, 2021.
[19] HOU Y Z, ZHENG L, GOULD S.Multiview Detection with Feature Perspective Transformation[C]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020.
[20] HOU Y Z, ZHENG L.Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)[C]// Proceedings of the 29th ACM International Conference on Multimedia. [s. l.]: ACM, 2021.
[21] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is All You Need[C]//Advances in Neural Information Processing Systems.[s.l.]: [s.n.], 2017.
[22] YUN S, HAN D, CHUN S, et al.CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE, 2019: 6022-6031.
[23] HE K M, ZHANG X Y, REN S Q, et al.Deep Residual Learning for Image Recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016.
[24] HAN R, ZHANG Y, FENG W, et al. Multiple Human Association Between Top and Horizontal Views by Matching Subjects' Spatial Distributions[EB/OL]. (2019-07-26)[2025-04-30]. https://arxiv.org/abs/1907.11458.
[25] MILLS-TETTEY G A, STENTZ A, DIAS M B. The Dynamic Hungarian Algorithm for the Assignment Problem With Changing Costs[M]. Pittsburgh: Robotics Institute, Carnegie Mellon University, 2007: 5-18.
[26] GAN Y Y, HAN R Z, YIN L Q, et al.Self-Supervised Multi-View Multi-Human Association and Tracking[C]// Proceedings of the 29th ACM International Conference on Multimedia. [s. l.]: ACM, 2021.
[27] ZHANG Y, WANG X, YE X, et al. ByteTrackV2: 2D and 3D Multi-Object Tracking by Associating Every Detection Box[EB/OL]. (2023-03-27)[2025-04-30]. https://arxiv.org/abs/2303.15334.

Funding

Stability support Project of National Defense Key Laboratory of Science and Technology (JCKY2024209C001); The Fundamental Research Funds for the Central Universities (2025300207)