Adaptive Job Shop Scheduling Based on Proximal Policy Optimization

Edward Mazur; Stefan Kołodziej

doi:10.64972/jaat.2025v3.183p11e:131-144

Authors

Edward Mazur Faculty of Mechanical Engineering, Casimir Pulaski University of Radom, 26-600 Radom, Poland
Stefan Kołodziej Faculty of Mechanical Engineering, Casimir Pulaski University of Radom, 26-600 Radom, Poland

DOI:

https://doi.org/10.64972/jaat.2025v3.183p11e:131-144

Keywords:

Reinforcement Learning, Proximal Policy Optimization, Job Shop Scheduling Problem, Intelligent Manufacturing

Abstract

Computer-based adaptive scheduling is currently being used in production to meet Industry 4.0 objectives for flexibility, resilience, and high efficiency. This research addresses the dynamic job shop scheduling issue (JSSP) using a reinforcement learning framework based on the Proximal Policy Optimisation (PPO) method. Its objective is to create a comprehensive, self-sufficient intelligent schedule-optimization system that can adapt to changes in the real working environment. The first kind employs neural policy networks for continuous optimisation under uncertainty and reformulates JSSP as a Markov Decision Process (MDP). To mimic real-world industrial variations, a comprehensive experimental platform featuring digital twins, real-time event injection, and high-fidelity simulation has been developed. Compare the PPO scheduler with baselines from 30 different trials using the evolutionary algorithm, deep Q-network, and earliest due date (EDD). According to the findings, machine utilisation is higher than 92%, the average makespan is lowered to 827 units for both GA and EDD, and complex conditions result in a shorter scheduling response latency (averaging 0.54 seconds). Additionally, the suggested framework reduced the recovery period following disruptions to less than 20-time units and maintained the lost-job ratio at less than 2%. According to the aforementioned research, the system's overall stability and efficiency have increased with the addition of PPO and dynamic allocation; as a result, it can be applied to new manufacturing platforms.