GCN-Enhanced Multi-Object Tracking for Video Surveillance: Adaptive Spatial-Temporal Graph Convolutional Modeling

Agnieszka Szymanski; Weronika Czarnecki

doi:10.64972/jaat.2024v2.130p2e:15-28

Authors

Agnieszka Szymanski Maria Curie-Sklodowska University, Faculty of Mathematics, Physics and Computer Science, 20-031 Lublin, Poland
Weronika Czarnecki Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, Gdansk University of Technology, 80-233 Gdansk, Poland

DOI:

https://doi.org/10.64972/jaat.2024v2.130p2e:15-28

Keywords:

Multi-Object Tracking, Graph Convolutional Network, Video Surveillance, Spatial-Temporal Modeling, Deep Learning

Abstract

Multi-object tracking is still challenging since many objects in intelligent video surveillance regions move irregularly or are frequently obscured and alter owing to lighting changes. This research presents a new tracking framework that integrates graph convolutional neural networks and adaptive spatial-temporal graph creation to overcome the shortcomings of appearance-based and sequential association approaches. Using a composite attention mechanism as a guide, dynamically create a graph in the system with each node representing a detection and edges representing spatial-temporal correlations. Contextual information is spread by Hierarchical Graph Convolutional Networks, which also strengthen identity association and trajectory continuity. The suggested framework has also demonstrated good improvement based on the experiment results in the MOT17, MOT20, and DukeMTMC datasets; for MOT17 and MOT20, it achieved a MOTA of 75.2% and 69.4%, respectively, outperforming earlier approaches in both precision and identity preservation. According to ablation analysis, in situations of dense crowds and continuous occlusion, adaptive edge design and a suitable depth for GCN are necessary to minimize identity swapping and fragmentation. In summary, the aforementioned findings show that context-aware, high-fidelity object tracking in real-world surveillance settings can be achieved through the use of graph-based reasoning.