Turn-Taking and Coordination in Multi-Agent Human-AI Systems for Real-Time Applications
Authors
PrimaryMichael Hildebrandt— Institute for Energy Technology · michael.hildebrandt@ife.no
Large language model (LLM)-based agents are increasingly considered for operational support roles in safety-critical systems. Most deployed systems to date, however, use a single agent or route requests to multiple sub-agents that report back to the invoking agent. Genuinely concurrent multi-agent architectures where several LLM agents operate simultaneously in shared communication and collaboration spaces alongside human operators are substantially less common and raise qualitatively different problems. Such architectures allow for functional specialisation, parallel processing of concurrent information streams, and AI participation structured as team membership rather than tool use, which better fits the organisation of safety-critical operations including nuclear power plants. It is this class of system, and the coordination problems specific to it, that this paper addresses.
LLM-based agents differ from both human operators and conventional software agents in ways that are directly relevant to coordination. Each agent operates on a context snapshot assembled at invocation time rather than through continuous perception. Inference latency ranges from seconds to minutes. The agent has no running process between turns and cannot detect events that occur during generation. In multi-agent configurations, these properties interact along three structural dimensions: a temporal dimension, in which agents act on snapshots that may already be stale and events occurring during generation accumulate unseen; an epistemic dimension, in which agents in the same communication space hold divergent world-models as a consequence of independent history windows and context compression; and an authority dimension, in which the absence of turn allocation at the protocol level allows multiple agents to produce conflicting responses to the same stimulus without coordination.
Message threading introduces a further layer of complexity. Threaded replies create causal discourse structure within a shared space and allow parallel focused sub-conversations, but an agent attending to a thread operates on a narrower epistemic slice than the full room provides. Agents on concurrent threads within the same room may hold substantially different situational pictures with no automatic reconciliation.
Tool use amplifies all three dimensions. Each tool call extends a turn's duration while the environment continues to evolve, introduces live data into an otherwise frozen context, and may execute side effects against shared state without transaction boundaries, permitting silent conflicting writes under concurrent agent operation.
We describe a room-based architecture (think Microsoft Teams or Discord for AI agents) in which agents occupy scoped communication spaces with configurable delivery modes governing response dispatch. Shared structured artifacts (including task lists and documents) externalise emerging agent consensus into persistent, observable state. Explicit flow-based turn-taking enforces sequential evaluation where ordering is required. Message threading is supported as a first-class construct. A room pause mechanism gives human operators a delivery-level gate over agent activity. These mechanisms were developed and evaluated in a prototype system built to explore the problems described.
The paper evaluates which challenge categories these mechanisms address and which they do not. Temporal response bounds, stale-context error detectability, artifact write conflict detection, and the handling of degraded or timed-out agents are identified as residual engineering requirements. We argue that the synchrony mismatch between continuous process dynamics and discrete agent turns is the most fundamental of these and needs improved design patterns so that AI agents can maintain adequate situation awareness.
✅Status: The abstract has been accepted!
← Check another abstract