IAPSAM Logo

Welcome to the PSAM 18 Abstract Status page.

Abstract RO282Full Paper + Presentation

Evaluation Types and Phases of Large Language Models and Generative Artificial Intelligence: Adapting GONUKE to Support Artificial Intelligence Uses for Nuclear Power

Authors

PrimaryRonald Laurids Boring— Idaho National Laboratory · ronald.boring@inl.gov
The Guideline for Operational Nuclear Usability and Knowledge Elicitation (GONUKE; Boring et al., 2015) was originally developed as an evaluation framework for nuclear power plant licensing. The purpose of GONUKE was to catalog different human factors evaluation types (e.g., verification, validation, and epistemiation) across the development lifecycle of nuclear technologies such as development of a new reactor control room. The framework allowed better incorporation of formative evaluation than the typical summative metrics covered in NUREG-0711, Human Factors Engineering Program Review Model, the definitive human factors process guide for the U.S. Nuclear Regulatory Commission. GONUKE has been extended to encompass graded approaches to evaluation (Boring et al., 2021) and multi-stage validation (Boring et al., 2024).

The advent of large language models (LLMs) and generative artificial intelligence (GenAI) in nuclear power has brought with it the need to evaluate the performance and efficacy of these applications. LLMs and GenAI are finding uses from licensing to operations, and from digital engineering to procedure development and deployment. While there is a general push in industry to improve accuracy, trustworthiness, generalizability, and appropriateness of the products of LLMs and GenAI while minimizing biases and hallucinations, nuclear applications introduce safety consequences and the explicit need to minimize the risk of poor outputs.

In this context, GONUKE is now being reframed to consider the types and phases of evaluation needed for LLMs and GenAI development. The GONUKE AI Framework considers verification, validation, and benchmarking as evaluation types, while spanning model development, scaled deployment, and maintenance as the life cycle. The metrics for evaluation are multi-tiered, ranging from model sufficiency to risk verified. The goal of the GONUKE AI Framework is to provide researchers and nuclear industry implementers of LLM and GenAI applications a basis for selecting evaluations and ensuring sufficient evaluations are performed to minimize adverse outcomes for risk-important uses.
Status: The abstract has been accepted!
← Check another abstract