IAPSAM Logo

Welcome to the PSAM 18 Abstract Status page.

Abstract SM297Full Paper + Presentation

A Large Language Model-based Method for Standardizing Heterogeneous Risk Registers

Authors

PrimaryStefano Marchetti— University of Maryland · smarchet@umd.edu
Co-authorSomil Varshney— University of Maryland, College Park · somilv@umd.edu
Co-authorCristian Schaad— University of Maryland · cschaad@umd.edu
Co-authorAdrian Maker— University of Maryland · amaker@umd.edu
Co-authorKatrina M Groth— University of Maryland · kgroth@umd.edu
Risk registers are documents used to report identified risks, their potential causes and consequences, assigned owners, and planned mitigation actions, and are widely used in safety and project risk management. In nuclear safety, they support the systematic identification, tracking, prioritization, and communication of technical, operational, and regulatory risks throughout the lifecycle of facilities, systems, and projects. However, risk registers are typically stored in heterogeneous, human-readable free-text formats that are difficult to use for downstream analytics and machine learning. Traditional Natural Language Processing (NLP) approaches, such as rule-based pipelines, template matching, and supervised information extraction models, often exhibit limited robustness when applied to documents with variable structure, inconsistent terminology, and scarce labeled data, requiring extensive manual engineering or domain-specific annotation to generalize effectively. To overcome these limitations, we present a Large Language Model (LLM)-based method for converting diverse risk registers into a unified machine-readable schema, developed as the winning solution to the OECD Nuclear Energy Agency (NEA) Coding Competition. The proposed approach combines deterministic parsing, used wherever information can be reliably extracted from structured inputs, with an LLM-based pipeline for higher-level semantic tasks. The core contribution is an agentic architecture in which LLM agents are assigned specific roles and tasks, including schema mapping for unfamiliar spreadsheets, enrichment of missing metadata, disambiguation of repeated or underspecified descriptions, and conservative typo correction. To guide the behavior of the agents and improve results consistency, a few-shot learning approach based on representative input-output examples is employed. Overall, this hybrid approach achieves high accuracy by restricting LLM generative inference to tasks that require semantic reasoning, while preserving deterministic, rule-based consistency wherever structure can be directly recovered. The method was evaluated on heterogeneous risk registers, achieving approximately 90% accuracy in information extraction and standardization, showing strong generalization across formats and domains. More broadly, this work demonstrates how LLM agents can be effectively integrated with deterministic pipelines to build scalable and auditable document intelligence systems.
Status: The abstract has been accepted!
📄Paper Status: Paper has been uploaded and is under review — View submitted paper
← Check another abstract