A cross-domain reference architecture for AI in operationally critical contexts — four layers (with L2 split into classical ML and LLM validators), 17 trust metrics including the first first-class formalisation of Computational Parsimony Ratio, and three reference instances (clinical, industrial multi-domain, judicial).
TRACE organises agentic AI systems into a four-layer reference architecture with an explicit split of the learned tier into classical ML (L2a) and LLM validators (L2b) — a stateful orchestration policy (L3) sits over the L2 inventory, and human supervision (L4) carries measurable load.
The framework is grounded in established measurement science — GUM, VIM, ISO/IEC 17025 — and treats trust as engineered and measured, not declared. Five acronymic principles (Trustworthy · Reasoned · Accountable · Context-bound · Escalated) are disciplined by an internal design constraint — Model Parsimony — quantified through the Computational Parsimony Ratio (CPR): the first first-class metric of complexity-performance trade-off in trustworthy AI.
Three instantiations — clinical decision support (Instance A), an industrial multi-domain platform (Instance B), and a judicial decision-support extension (Instance C) — demonstrate domain neutrality. The architecture provides the structural base for layer-wise GUM-style uncertainty propagation toward formal certification.
Rows are architectural layers (with L2 split into classical ML and LLM validators); columns are reference instances. Each cell names the concrete artefact that fills the layer in that instance.
The TRACE acronym reflects five user-visible properties. Model Parsimony is a quantified internal design constraint that disciplines L2a / L2b selection.
Every prescriptive action carries a machine-readable evidence chain — data → inference → decision.
Human oversight is an architectural layer with measurable load and override rights — not a cosmetic safety net.
Authority is earned through accumulated stability data and explicit qualification — not granted by default at release.
Input context is explicitly specified, dated, and refreshed as part of the safety envelope.
Each quality property is specified, measured, calibrated, and monitored over time.
The type of learned component (classical ML, specialised neural network, LLM, hybrid) is chosen by task fit — not by LLM presumption.
Internal design constraint — quantified via CPR, not visible in the TRACE acronym.
Seventeen measurable indicators: twelve per-layer, four cross-cutting, and one economy metric (CPR). Filter by layer or type.
Two foundational implementations (A clinical, B industrial multi-domain) motivated the formalisation. A third (C judicial) demonstrates portability into a domain with a fundamentally different governance context.
The industrial platform spans three operational sub-domains. The four-layer architecture instantiates differently in each: the dominant layer shifts with the type of evidence — illustrating the Model Parsimony principle.
The same four-layer architecture instantiates differently in each sub-domain: the dominant layer shifts with the type of evidence. Model Parsimony applied as a per-sub-domain design discipline, not a global LLM presumption.
Paper 1 (this site's companion) is the cross-domain framework synthesis. Paper 0 grounds it in the clinical foundational instance; Papers 2 and 3 are domain and metrological deep-dives.