Research

Theoretical and
analytical work

The accord rests on a body of theoretical work on corrigibility, action boundaries, and evaluation methodology. This page maps the vocabulary the accord uses and links to the underlying papers. Forthcoming work will be published with reproducible artifacts and listed here.

Framework

LWD-R · The four layers

LWD-R names the four layers along which a deployed AI system can be evaluated for openness. The principle behind the four layers is that anything which shapes the operative behavior of the deployed system falls within disclosure obligations, regardless of where in the system it sits. Open at one layer is necessary but not sufficient — open at all four is the standard the accord names.

  1. Logic

    Model architecture, inference code, tokenizer, and any deployment-time transformations applied between training and inference. For architectures with internal routing or gating, Logic includes the routing networks alongside the expert components. The tokenizer matters particularly for language coverage: a tokenizer that does not represent a language well constrains what any model built on it can learn about that language.

  2. Weights

    Trained parameters released under terms that allow use, redistribution, and modification. Where weights are deployed in transformed form — quantized, distilled, pruned, or merged with adapters — the transformations are documented sufficiently for independent reproduction. For mixture-of-experts and similar architectures, weights include the gating networks alongside the experts. Reproducibility is assessed against the complete deployed system, not against active parameters alone.

  3. Data

    Training corpus and the artifacts that shaped what the model learned: filtering and curation procedures, training schedules, and routing or gating training data for architectures that use them. Where training data is itself derivative (synthetic data generated by other models or expert outputs distilled from larger systems), provenance extends to the source. For models trained through reinforcement learning, the Data layer extends to reward models, verifiers, exploration policies, and procedural specifications that shaped the training signal.

  4. Representation

    The operative categorical schema the deployed system uses to interpret the world. Quantization, distillation, pruning, MoE routing, and the training procedures that shape representational geometry all modify R. R-layer disclosure must describe the deployed system's operative representation, not a nominal one no deployment uses. Where the deployed R derives from another model's R through synthetic training data, distillation, or weight inheritance, the provenance extends through that derivation chain.

See Accord § 2 for the canonical formulation, the glossary below for the other terms the accord uses, and the FAQ for common questions about scope and overlap with existing definitions.

Glossary

Vocabulary the accord introduces

The accord uses a small set of terms with specific meanings. These are summarised here and defined in full in the section noted at the right.

Inference forkability § 4
Can someone take the released model and run it?
The ability to take a released model and execute it on hardware that an independent party can assemble. Inference forkability is necessary for any meaningful claim of openness, but it is not sufficient: a system that can be run but not retrained gives users a tool, not accountability.
Training forkability § 4
Can someone reproduce or fundamentally alter the model?
The ability to reproduce, retrain, or substantively modify a released model from disclosed materials. Training forkability is what produces accountability — only when the training pipeline can be re-run can the community contest the choices encoded in the deployed system.
Compute capture § 4
Reproduction is illegal-free but practically out of reach.
When training cost so far exceeds the compute the open community can assemble that legal forkability becomes meaningless without practical reproducibility. A model whose weights are released but whose training run cost twenty million dollars is captured by compute regardless of license.
Data capture § 4
Training data can only come from infrastructure you do not run.
When the data required to train a model can only be produced by infrastructure the open community cannot reproduce — synthetic instruction data, reasoning traces, distilled experts, or reward signals that flow from frontier systems. Reproduction then requires reproducing the upstream pipeline that produced the training signal, not just the small model itself.
Hardware capture § 5
Local inference is gated by closed silicon and runtimes.
When local inference is the deployment topology but the substrate is not addressable by the open community: proprietary neural processing units, closed runtime layers, or gatekept distribution channels. Edge deployment escapes cloud capture but remains within hardware capture if the silicon and runtime are not themselves open.
Action boundary § 7
The deterministic policy layer outside the model that decides what is allowed.
In an agentic system, the model proposes actions; the action boundary decides which proposals are allowed. The boundary is a deterministic policy layer that sits outside the prompt context, is inspectable, and is machine-readable. The same transparency that helps operators specify what agents may do helps defenders see when agents are being exploited.
Harness § 7
The architecture that turns a model into something agentic.
A harness manages memory across steps, decomposes plans, selects tools, parses outputs, recovers from errors, and orchestrates flow between model inference and tool execution. A closed harness around an open model produces a system whose behavior cannot be reproduced from the model alone. Harness disclosure is held to the same standard as model disclosure.
DPI · Deterministic Public Infrastructure Paper I
Ledgers, registries, payment rails, and other deterministic public systems.
In the corrigibility framework, the class of infrastructure whose behavior is governed by deterministic rules — registries, ledgers, payment rails, and similar systems where outcomes follow legibly from inputs. Examined in Paper I.
EPI · Epistemic Public Infrastructure Paper II
Learned and agentic systems whose outputs shape what counts as known.
In the corrigibility framework, the class of infrastructure whose behavior is shaped by learning, by training data, and by the categorical schemas the system inherits — AI systems, recommender systems, and the learned components of administrative decision-making. Examined in Paper II.

Corrigibility

Two preprints, five tests

Theoretical work on whether infrastructure permits correction by the people it affects. The framework is evaluated against five tests: EXIT, CODE, AUDIT, GOVERN, FORK.

Corrigibility framework

A structural framework for evaluating whether digital and AI-mediated public infrastructure permits correction by affected participants. The framework comprises two preprints, evaluated against five tests: EXIT, CODE, AUDIT, GOVERN, and FORK.

Paper I · DPI

Corrigibility as a Structural Precondition for Digital Public Infrastructure. Examines corrigibility requirements for deterministic systems (ledgers, registries, payment rails) and how structural safeguards let affected parties detect errors and hold infrastructure accountable. DPI = Deterministic Public Infrastructure.

Preprint doi.org/10.2139/ssrn.6059075 PDF

Paper II · EPI

Epistemic Capture and the Action Boundary. Extends the framework to learned and agentic systems, examining how epistemic constraints shape decision boundaries in AI and machine learning infrastructure. EPI = Epistemic Public Infrastructure.

Preprint doi.org/10.5281/zenodo.19863649 PDF

Related directions

The accord draws on the OSI Open Source AI Definition (§2 in particular), Princeton's research on AI agent reliability, and the emerging Agentic Skills specification work on composable, inspectable agent capabilities.