IAPSAM Logo

Welcome to the PSAM 18 Abstract Status page.

Abstract BJ261Full Paper + Presentation

Generating Daily Load-Following Operation Scenarios by using Pre-trained Reinforcement Learning for Nuclear Power Plants

Authors

PrimaryJunhyeong Bang— Korea Advanced Institute of Science and Technology · bjun201gosegulv@kaist.ac.kr
Co-authorJonghyun Kim— jonghyun.kim@kaist.ac.kr
As renewable penetration increases, power supply variability is growing, requiring more flexible plant operation. This demand now extends to nuclear power plants (NPPs), which have traditionally operated at a constant power level. In this context, Daily Load-Following Operation (DLFO), in which a plant cycles between low and high power levels daily, has been discussed as an alternative. However, since repeated power transitions introduce additional thermal and neutronic stresses on the reactor core, DLFO must be conducted within safety-constrained operating bounds.
To perform DLFO, a scenario matching the target power profile must be prepared in advance, and this process has so far relied on expert trial and error by using system codes or software. A DLFO scenario is time-series data of operating variables that achieve the target power from a given initial core state while satisfying safety constraints. Traditionally, experts manually designed such scenarios and verified them using detailed analysis codes, revising them until the criteria were met.
However, this manual process is computationally expensive and makes it difficult to obtain scenarios covering diverse initial conditions and long operating periods. Reactor behavior is also nonlinear and history-dependent, so the required strategy changes with the current state and past operating history. This creates a need for AI-based automation that can learn the relation between operating conditions and control strategies. Supervised learning (SL) requires many input-output examples, but each scenario is still expensive to generate. Reinforcement learning (RL), in which an agent learns a policy through interaction and rewards, can reflect both target tracking and constraint satisfaction, but training from scratch can be inefficient and unstable.
To address these limitations, this study proposes a framework that combines pre-training with RL. First, a sequence model combining LSTM and a Transformer-based architecture is designed to capture past operating history. The LSTM encodes the historical sequence, while the Transformer processes the current state and generates operational scenarios. Second, this model is pre-trained via SL with a small set of practically obtainable baseline DLFO scenarios, allowing it to internalize fundamental operating patterns and physical relationships. Third, the pre-trained model is employed as the initial policy of the RL agent, enabling physically informed exploration during early training. Finally, the study presents a framework for generating DLFO operational scenarios that satisfy safety constraints while reflecting the history-dependent physical behavior of the reactor core.
Status: The abstract has been accepted!
📄Paper Status: Paper has been uploaded and is under review — View submitted paper
← Check another abstract