Application of Enhanced Gradient Boosting Algorithms in Large-Scale Time Series Anomaly Detection

Błażej Kornel Kania; Wioletta Gajewska

doi:10.64972/dea.2025.v4i1.19410d:127-141

Authors

Błażej Kornel Kania Faculty of Information and Communication Technology, Wrocław University of Science and Technology, Wrocław, 50-371, Poland
Wioletta Gajewska Faculty of Information and Communication Technology, Wrocław University of Science and Technology, Wrocław, 50-371, Poland

DOI:

https://doi.org/10.64972/dea.2025.v4i1.19410d:127-141

Keywords:

Anomaly Detection, Gradient Boosting, Distributed Computing, Feature Engineering, Edge Computing, Adaptive Algorithms

Abstract

This paper discusses the detection of large-scale time series anomalies in complex financial, industrial, and cyber-physical environments. A distributed architecture based on an improved gradient boosting algorithm is proposed to achieve efficient and high-precision anomaly detection in large-scale, high-speed data streams. Data collection, distributed preprocessing, parallel model execution, and hierarchical aggregation are the three components of the framework that support automatic feature extraction and flexible resource allocation. For experimental validation, representative public and synthetic datasets were selected, using a twelve-node heterogeneous computing cluster with CPU and GPU hardware. According to the above experiments, the system throughput can linearly increase to 1.62 million data points per second, with a median inference latency of 11.4 milliseconds, and the resource utilization of both CPU and GPU remains below 65%. The proposed method improves the recall, precision, and F1-score of the aforementioned baseline method by up to 2.5% to 8.4%. Ablation studies found that adaptive regularization and automated feature engineering are the reasons the model remains stable and generalizable under concept drift and sudden noise. The system has high reliability, allowing for failover to reduce the risk of service interruptions caused by load fluctuations. The above results indicate that the enhanced ensemble learning model can be used for real-time, large-scale anomaly detection in modern data-driven applications.