Hierarchical Multimodal Data Fusion for Robust Real-Time Perception in Dynamic Sensor-Driven Environments
DOI:
https://doi.org/10.64972/dea.2025.v4i1.190d6:68-83Keywords:
Multimodal Fusion, Real-Time Perception, Deep Learning, Sensor Robustness, Autonomous SystemsAbstract
High-accuracy, dependable perception capabilities are currently needed for more sophisticated real-time applications of autonomous robotics and intelligent surveillance systems, which are made possible by multi-modal sensor fusion. The current study investigates how to keep perception stable while dynamics, noise, and failures are present. This study proposes a multi-level fusion framework that combines several sensor data types with temporal gating, deep residual correction, and adaptive attention weighting for optimal fusion. The experiment also includes ablation research and deployments in a variety of settings, including extended real-world navigation circuits and the structured lab. The following are the comparable quantitative findings: In the event of significant sensor dropout or environmental changes, the inference delay per cycle is less than 43 ms, the mean segmentation error is less than 2.5 cm, and the top-1 identification accuracy is 94.5%. All of the architecture's modules are necessary for strong resilience and spatial precision, according to ablation experiments. The approach consistently outperforms previous baseline techniques in terms of recognition accuracy and exhibits consistent error behaviour in situations that are unfamiliar and undergoing rapid change. To put it briefly, the hierarchical fusion approach has expanded the real-time multi-modal perception state-of-the-art and offered a scalable and fault-tolerant basis for useful applications in safety-critical situations.