traces.solutions
A metrological framework · Paper 1 · iSCSi 2026

TRACE: an engineering
framework for trustworthy
agentic AI.

A cross-domain reference architecture for AI in operationally critical contexts — four layers (with L2 split into classical ML and LLM validators), 17 trust metrics including the first first-class formalisation of Computational Parsimony Ratio, and three reference instances (clinical, industrial multi-domain, judicial).

Read the framework →PublicationsPaper 1 → Procedia CS · iSCSi 2026
4
Layers
L1 · L2a/L2b · L3 · L4
5+1
Principles
5 acronymic + 1 design constraint
17
Metrics
12 layer · 4 cross · 1 economy
3
Instances
A clinical · B industrial · C judicial
Traceability chain
SI
Primary standards
GUM · VIM · ISO/IEC 17025
L1
Deterministic core
rules · invariants
L2a
Classical ML
GUM-verifiable calibration
L2b
LLM validators
semantic / coherence checks
L3
Tiered orchestration
stateful escalation policy
L4
Human supervision
calibrated oversight
Δ
Domain instance
A clinical · B industrial · C judicial
§0 · Abstract

TRACE organises agentic AI systems into a four-layer reference architecture with an explicit split of the learned tier into classical ML (L2a) and LLM validators (L2b) — a stateful orchestration policy (L3) sits over the L2 inventory, and human supervision (L4) carries measurable load.

The framework is grounded in established measurement science — GUM, VIM, ISO/IEC 17025 — and treats trust as engineered and measured, not declared. Five acronymic principles (Trustworthy · Reasoned · Accountable · Context-bound · Escalated) are disciplined by an internal design constraint — Model Parsimony — quantified through the Computational Parsimony Ratio (CPR): the first first-class metric of complexity-performance trade-off in trustworthy AI.

Three instantiations — clinical decision support (Instance A), an industrial multi-domain platform (Instance B), and a judicial decision-support extension (Instance C) — demonstrate domain neutrality. The architecture provides the structural base for layer-wise GUM-style uncertainty propagation toward formal certification.

§1

Reference architecture — five layers × three instances

Rows are architectural layers (with L2 split into classical ML and LLM validators); columns are reference instances. Each cell names the concrete artefact that fills the layer in that instance.

layer \ instance
Instance A
Clinical decision support
Instance B
Industrial multi-domain
Instance C
Judicial decision support
L4
Human supervision
Accredited operator oversight — ISO/IEC 17025
Physician as final reviewer; aligned with the TRIAD human–AI collaboration framework
Driller, supervisor, HSE officer (technology / operations); lawyer, procurement officer (administrative); DNV-RP-0671 governance
Judge or judicial assistant as final reviewer; empirical precedent: Kolkman et al. (Justitia, 2024)
L3
Tiered orchestration
Hierarchical calibration chains
Routine cases via calibrated L2a; borderline confidence → L2b validator; joint high-risk ∧ high-confidence or L2a/L2b inconsistency → mandatory clinician handoff
Cost-tiered routing per sub-domain; sustained anomaly + high-risk → driller alert; SLA breach → operator escalation; high-stakes clauses → lawyer / compliance officer
L2a/NER selects relevant precedents; L2b validator mandatory hallucination check; high-stakes claim with confident recommendation → juridical review
L2b
LLM validators
Subjective inspection step in metrological audit
LLM validator over free-text clinical notes; coherence checks against the patient's anamnesis
Semantic correlation of drilling logs with incident narratives; clause extraction and document diff in the administrative sub-domain
LLM for case-material analysis, semantic precedent correlation, draft document preparation
L2a
Classical ML
Instrument characterization — GUM
Risk scores, vital-sign time series, lab-data classifiers; structured-text NER over clinical records
Anomaly detection, predictive-maintenance classifiers, residual-life models; CNN/LSTM over MWD/LWD signals; vendor risk scoring
Document classification, precedent-relevance ranking, NER over case materials
L1
Deterministic / physics core
Reference materials, traceable standards
Rule-oriented clinical logic; FUTURE-AI-aligned traceability
Physics-informed control: priority controller, drilling automation, parameter envelopes; safety interlocks; compliance scaffolding for procurement and contracts
Procedural and material legal norms; CEPEJ Ethical Charter as governance reference
§2

Five principles + one design constraint

The TRACE acronym reflects five user-visible properties. Model Parsimony is a quantified internal design constraint that disciplines L2a / L2b selection.

01
T · Trustworthy
Evidence traceability

Every prescriptive action carries a machine-readable evidence chain — data → inference → decision.

→ ETC · Evidence Trail Completeness
02
R · Reasoned
Bounded human supervision

Human oversight is an architectural layer with measurable load and override rights — not a cosmetic safety net.

→ OvR · Override Rate
03
A · Accountable
Staged autonomy

Authority is earned through accumulated stability data and explicit qualification — not granted by default at release.

→ ABC · Autonomy Boundary Compliance
04
C · Context-bound
Bounded context

Input context is explicitly specified, dated, and refreshed as part of the safety envelope.

→ CFI · Context Freshness Index
05
E · Escalated
Metrological accountability

Each quality property is specified, measured, calibrated, and monitored over time.

→ CE · Calibration Error
Internal design constraint — not in the TRACE acronym
06
Model parsimony

The type of learned component (classical ML, specialised neural network, LLM, hybrid) is chosen by task fit — not by LLM presumption.

Internal design constraint — quantified via CPR, not visible in the TRACE acronym.

Quantified by
CPR · Computational Parsimony Ratio
CPR = 1 — optimal
CPR ≪ 1 — architectural overhead
§3

Trust metrics

Seventeen measurable indicators: twelve per-layer, four cross-cutting, and one economy metric (CPR). Filter by layer or type.

Layer
Type
17 of 17
ABC
Autonomy Boundary Compliance
X-layer
Compliance
Share of actions taken within the system's defined autonomy boundaries.
CE
Calibration Error
X-layer
Calibration
Deviation of stated confidence from empirical accuracy.
CFI
Context Freshness Index
L2
Freshness
Weighted freshness of data in the active context.
CPR
Computational Parsimony Ratio
Economy
Parsimony
Ratio of resource cost of the most economical model that meets task requirements (precision, calibration, operational reliability) to the actual cost of the deployed model. CPR = 1 — optimal; CPR ≪ 1 — architectural overhead. First first-class formalisation of complexity-performance trade-off as a metric in trustworthy AI.
CRP
Context Relevance Precision
L2
Precision
Share of context items actually relevant to the task.
EP
Escalation Precision
L3
Precision
Share of escalations that actually required a higher tier.
ETC
Evidence Trail Completeness
X-layer
Traceability
Share of outputs with a complete evidence chain (data → decision).
FPA
False Positive Attenuation
L3
Filtering
Suppression of false-positive escalations through policy-driven re-invocation of L2 components.
IPSR
Input Perturbation Stability Rate
L2
Stability
Share of responses stable to input paraphrasing and perturbation.
OSI
Operational Stability Index
X-layer
Drift
Variation of key trust metrics over time.
OvR
Override Rate
L4
Behavior
Share of AI outputs modified by human reviewers.
RBI
Review Burden Index
L4
Load
Average reviewer time per case at the human tier.
RCI
Rule Consistency Index
L1
Stability
Output stability of rules under system updates.
RCR
Rule Coverage Rate
L1
Coverage
Share of scenarios covered by explicit rules of the deterministic core.
SNR
Signal-to-Noise Ratio
L4
Filtering
Ratio of critical cases to total flow reaching the human tier.
TCC
Tier Cost Coefficient
L3
Cost
Aggregate compute cost of the chosen escalation path.
UTC
Update Traceability Coefficient
L1
Traceability
Share of rule changes with documented rationale and traceable provenance.
§4

Reference instances

Two foundational implementations (A clinical, B industrial multi-domain) motivated the formalisation. A third (C judicial) demonstrates portability into a domain with a fundamentally different governance context.

Instance A
Clinical decision support
Foundational instance
Lead
Zabolotnii, Holynko, Antonenko
Status
Paper 0 · IMM journal — under review
Instance B
Industrial multi-domain
Foundational instance · oil & gas
Lead
Shcherban
Status
Patent pending · UA u 2025 04038 · U.S. Copyright Office deposit (Mar 2026)
Instance C
Judicial decision support
Partial extension
Lead
Zabolotnii (with the Supreme Court of Ukraine, funded by Expertise France)
Status
Modernisation of the "Legal Positions Database" portal
§4·B

Instance B in detail — sub-domain × layer

The industrial platform spans three operational sub-domains. The four-layer architecture instantiates differently in each: the dominant layer shifts with the type of evidence — illustrating the Model Parsimony principle.

layer \ sub-domain
Technology
Upstream: drilling, production, well operations
Operations
Maintenance decisions, equipment monitoring, KPI tracking
Administrative
Document flow, procurement, contract lifecycle, compliance
L1
Deterministic core
ACTIVE
Physics-informed drilling control: priority controller, automation, parameter envelopes; PID/MPC; tolerance limits
ACTIVE
Maintenance procedures, regulatory intervals, safety interlocks
DOMINANT
Compliance scaffold: procurement rules, blacklists, dual-signature requirements, contract-lifecycle milestones, audit trail
L2a
Classical ML
DOMINANT
XGBoost anomaly detection on MWD/LWD; CNN/LSTM for ROP/ESP forecasting; correlational clustering
DOMINANT
Predictive-maintenance classifiers, residual-life models, anomaly detection (survival analysis)
PRESENT
Anomaly detection on procurement bids, vendor risk scoring, regression-based price benchmarking
L2b
LLM validators
PRESENT
Asynchronous semantic correlation of drilling logs with incident narratives
PRESENT
LLM extracting patterns from free-form incident reports and historical maintenance journals
DOMINANT
Clause extraction, semantic document diff, non-standard term detection, classification by type
L3
Tiered orchestration
ACTIVE
Tolerance-band routing; sustained anomaly ∧ high-risk → automated driller alert; L2b engaged for retrospective review
ACTIVE
Cost-tiered routing; SLA breach, joint risk ∧ confidence, or L2a/L2b inconsistency → operator escalation; mandatory full audit trail
ACTIVE
Risk-tiered routing; standard contract → L1 automation; non-standard clause → L2a confirmation; high-stakes ∧ dual-signature → mandatory handoff
L4
Human supervision
ACTIVE
Driller / supervisor with override rights; DNV-RP-0671 governance reference
ACTIVE
Operations supervisor as final reviewer
ACTIVE
Lawyer / procurement officer / compliance officer as final reviewer
Technology
Upstream: drilling, production, well operations
L1 · Deterministic coreACTIVE
Physics-informed drilling control: priority controller, automation, parameter envelopes; PID/MPC; tolerance limits
L2a · Classical MLDOMINANT
XGBoost anomaly detection on MWD/LWD; CNN/LSTM for ROP/ESP forecasting; correlational clustering
L2b · LLM validatorsPRESENT
Asynchronous semantic correlation of drilling logs with incident narratives
L3 · Tiered orchestrationACTIVE
Tolerance-band routing; sustained anomaly ∧ high-risk → automated driller alert; L2b engaged for retrospective review
L4 · Human supervisionACTIVE
Driller / supervisor with override rights; DNV-RP-0671 governance reference
Operations
Maintenance decisions, equipment monitoring, KPI tracking
L1 · Deterministic coreACTIVE
Maintenance procedures, regulatory intervals, safety interlocks
L2a · Classical MLDOMINANT
Predictive-maintenance classifiers, residual-life models, anomaly detection (survival analysis)
L2b · LLM validatorsPRESENT
LLM extracting patterns from free-form incident reports and historical maintenance journals
L3 · Tiered orchestrationACTIVE
Cost-tiered routing; SLA breach, joint risk ∧ confidence, or L2a/L2b inconsistency → operator escalation; mandatory full audit trail
L4 · Human supervisionACTIVE
Operations supervisor as final reviewer
Administrative
Document flow, procurement, contract lifecycle, compliance
L1 · Deterministic coreDOMINANT
Compliance scaffold: procurement rules, blacklists, dual-signature requirements, contract-lifecycle milestones, audit trail
L2a · Classical MLPRESENT
Anomaly detection on procurement bids, vendor risk scoring, regression-based price benchmarking
L2b · LLM validatorsDOMINANT
Clause extraction, semantic document diff, non-standard term detection, classification by type
L3 · Tiered orchestrationACTIVE
Risk-tiered routing; standard contract → L1 automation; non-standard clause → L2a confirmation; high-stakes ∧ dual-signature → mandatory handoff
L4 · Human supervisionACTIVE
Lawyer / procurement officer / compliance officer as final reviewer
IntensityDOMINANTprimary decision pathACTIVEregular contributionPRESENTnarrow / supporting role

The same four-layer architecture instantiates differently in each sub-domain: the dominant layer shifts with the type of evidence. Model Parsimony applied as a per-sub-domain design discipline, not a global LLM presumption.

§5

Four-paper roadmap

Paper 1 (this site's companion) is the cross-domain framework synthesis. Paper 0 grounds it in the clinical foundational instance; Papers 2 and 3 are domain and metrological deep-dives.

[paper 0]
Clinical foundational
From Black-Box Confidence to Measurable Trust in Clinical AI: A Framework for Evidence, Supervision, and Staged Autonomy
Zabolotnii, Holynko, Antonenko · IEEE Instrumentation & Measurement Magazine — Special Issue "A Measure of Trust in Healthcare"
Under review · Sep 2026
[paper 1]
Framework synthesis
TRACE: A Metrologically Grounded Engineering Framework for Trustworthy Agentic AI in Operationally Critical Domains
Zabolotnii, Shcherban · Procedia Computer Science · iSCSi 2026 (Azores, 20–22 May 2026)
Submission target — this site is the companion
[paper 2]
Industrial multi-domain deep-dive
Industrial Multi-Domain Agentic Platform for Upstream Oil & Gas: A TRACE Instance
Shcherban (lead), Zabolotnii · Scopus-indexed industrial-AI journal
Planned · Q3 2026
[paper 3]
Metrological deep-dive
Layer-wise GUM Propagation in TRACE: A Formal Uncertainty Budget for Agentic AI Systems
Zabolotnii (lead), Shcherban · IEEE Transactions on Instrumentation and Measurement
Planned (optional)