PSAM 18 - Abstract Status

Welcome to the PSAM 18 Abstract Status page.

Abstract YJ273Full Paper + Presentation

Meta-Reinforcement Learning with Adaptive Reward Shaping for Multi-Objective Operational Control in Nuclear Power Plants

Authors

PrimaryYoojoon Seoung— UNIST · yjs0427s@unist.ac.kr

Co-authorSeung Jun Lee— UNIST · sjlee420@unist.ac.kr

This paper proposes a Meta-Reinforcement Learning (Meta-RL) framework with adaptive reward shaping for simultaneous multi-objective operational control in nuclear power plant simulators. Conventional RL-based control requires labor-intensive manual reward engineering, which is particularly challenging when multiple interdependent process variables must be regulated concurrently. To address this, we introduce a two-level optimization structure: an inner-loop Soft Actor-Critic (SAC) agent that learns a unified control policy under shaped rewards for multiple objectives, and an outer-loop Meta-Policy network that dynamically adjusts the parameters of task-specific 5-point piecewise potential functions based on episodic performance statistics. The shaped reward follows the Potential-Based Reward Shaping (PBRS) framework, which theoretically guarantees preservation of the optimal policy under any admissible potential. The proposed framework is validated on two coupled control tasks implemented on the Compact Nuclear Simulator (CNS): (1) pressurizer pressure control via spray valve and heater manipulation, where the Meta-Policy minimizes deviation and overshoot beyond the target pressure setpoint; and (2) pressurizer water level control via feedwater control valve adjustment, where the Meta-Policy penalizes both upward and downward deviation from the target level. The physical coupling between pressure and level governed by the reactor coolant system thermodynamics makes independent single-objective tuning insufficient, motivating the proposed joint Meta-RL formulation. In both tasks, the Meta-Policy observes episode-level statistics including mean squared error and directional deviation magnitude, and autonomously updates reward landscape parameters via policy gradient, eliminating manual reward re-engineering across operationally distinct yet physically coupled objectives.

✅Status: The abstract has been accepted!

✅Paper Status: Accepted with comments — View submitted paper

← Check another abstract