Vision Transformer-Based High-Resolution Satellite Road Extraction: Architecture and Performance Evaluation

Jerzy Baran; Konrad Pietrzak; Łukasz Gajda

doi:10.64972/jiic.2026v4.179p12s:151-163

Authors

Jerzy Baran Faculty of Computer Science and Telecommunications, Tadeusz Kościuszko Cracow University of Technology, Kraków 31-155, Poland
Konrad Pietrzak Faculty of Informatics, University of Białystok, Białystok 15-328, Poland
Łukasz Gajda Faculty of Informatics, University of Białystok, Białystok 15-328, Poland

DOI:

https://doi.org/10.64972/jiic.2026v4.179p12s:151-163

Keywords:

Vision Transformer, Road Extraction, Remote Sensing, Satellite Imagery

Abstract

Accurately extracting road networks from high-resolution satellite pictures is necessary for transportation management, urban planning, and the development of geographic information systems (GIS). In order to solve the geographical fragmentation and continuity issues of remote-sensing-based road segmentation, this research presents a unique Vision Transformer framework. To guarantee the precision of delineation and the stability of connection, a specific structure for feature-level fusion and loss function modification has been suggested in the new model. With over 10,000 annotated samples including urban, rural, and coastal environments, three well-known public datasets from various locations and circumstances were employed for the experiment. In every test, the ViT-based approach's mean F1-score and Intersection over Union were consistently higher than 0.82 and 0.71, respectively, and demonstrated a notable improvement over the convolutional and transformer baselines. The suggested method can preserve road connectivity and lessen the issue of false alerts in a crowded and complicated urban region, according to the experiments mentioned above. The model will be used in large-scale mapping pipelines because of its outstanding segmentation accuracy and computational economy. This work has shown that attention-driven multi-scale representations enhance automated road extraction's accuracy and spatial consistency. This approach's increased generalizability and accuracy have produced some positive outcomes and offered solid scientific basis for the subsequent creation of high-precision satellite image analysis systems.