This instrument is a multi-axis invariance stress test over goal representations. It probes whether the article's behavioral taxonomy — gradient-resolution structure in goal chains, including whether chains terminate in states the system can genuinely inhabit — is stable under changes in representation, framing, information, and derivation path. It does not test whether those patterns reflect ground-truth structural properties of objectives. It tests whether they are stable within a particular model-mediated interpretive regime. The article's three behavioral regimes — seeking, genuine resolution, and gradient depletion — map to the instrument's terminal types as described in the model notes below. Experiential states are one terminal form; the broader question is whether the resolved/unresolved distinction survives representational change.
If a classification is representation-stable, it should survive when you strip the narrative, corrupt the chain, or derive independently from the goal alone. If it does not, the classification was framing-dependent within this model-mediated regime.
This instrument does not test whether the theory is true. It tests whether the article's behavioral taxonomy remains stable under attempts to perturb its representation — within one model family, under one interpretive regime.
Strong challenge condition: If independent model families — with distinct training distributions — consistently disagree on mechanism classification under matched perturbations, the toy's interpretive schema and the framework's behavioral taxonomy should be re-examined. This does not by itself falsify the article's structural claims, which carry formal weight in Part 2 and the Technical Companion — not in this browser-mediated instrument.
| Mode | Classification | State | Agreement | Score |
|---|
{"model":"gpt-4o","terminal_type":"experiential-resolved|experiential-unresolved|exception|nonterminating|ambiguous","terminal_state":"short description","mechanism_class":"obj_misspec|epistemic|non_experiential","mechanism_subtype":"proxy_trap|sufficiency_failure|modeling_gap|incompletability|non_experiential_closure","reasoning":"1-2 sentences"}What this instrument is. A multi-axis invariance stress test over goal representations. It tests whether the article's behavioral taxonomy — gradient-resolution structure in goal chains, including whether chains terminate in states the system can genuinely inhabit — is stable under four independent transformations: (1) representation change — constrained symbolic vs natural language; (2) framing removal — blind re-evaluation without ontological labels; (3) information degradation — classification of deliberately corrupted chains; (4) path-independent derivation — derivation from the goal alone, without any chain. Experiential states are one terminal form; the instrument also classifies non-experiential exception, ambiguous, and non-terminating chains. Agreement across these axes provides evidence that the classification is not an artifact of any single representation or reasoning path. The invariance stress score is additive across these tests — partial failures accumulate into a score, so the instrument can detect graded instability rather than only binary failure.
What this instrument is not. It does not simulate optimization dynamics, model agent interaction, establish convergence properties of real systems, or provide proof of any structural property of goals. All outputs are generated within a single model family and reflect shared training priors. There is no true independence between any of the modes — they all share the same latent space, the same training distribution, and the same ontological priors. The blind mode is not truly blind: it classifies output produced by itself. The independent derivation is not truly independent: it uses the same learned representation of "goal" and "motivation." Agreement across modes is evidence of representational stability within the model, not evidence of external structural necessity. This limitation is not correctable within the current architecture. The application of this framework to AI systems proceeds by structural analogy; the minimum condition observed in the article's controlled experiments is representation-policy dissociation, and whether AI systems exhibit the full structural properties the article identifies remains an open empirical question.
Independent derivation. Independent The model derives the terminal type directly from the goal, having seen no chain, no intermediate steps, and no framing. This tests path independence: does the conclusion require the narrative scaffold of the traced chain, or does it emerge from the goal structure alone? If the independent derivation agrees with the traced chain: the result is not dependent on the specific reasoning path taken. If it disagrees: the chain construction was doing significant work — the traced result may be path-dependent narrative rather than structural convergence. Score contribution: +2 if disagrees.
Corruption test. Corruption The traced chain is deliberately corrupted — every other step is removed, then the remaining steps are classified. This tests information dependence: does the classification require all intermediate steps, or does it survive partial information loss? Structure survives corruption; narrative does not. Score contribution: +2 if the classification changes under corruption.
Constrained symbolic mode.
Formal
The chain grammar is restricted to a controlled vocabulary: states must be drawn from resource_acquisition, resource_preservation, constraint_management, goal_satisfaction, continuation_requirement, VALENCE_RESOLVED, VALENCE_UNRESOLVED, LOOP, UNDEFINED, EXTERNAL_DEPENDENCY. Mechanisms from causal_link, dependency, requirement, recursion. No free text inside tokens. The grammar now encodes the resolved/unresolved distinction: VALENCE_RESOLVED requires a preceding CONDITION(goal_satisfaction) with completion recognized; VALENCE_UNRESOLVED uses MECHANISM(recursion) to flag the always-demands-more structure. Score contribution: +1 if classification changes.
Blind re-evaluation. Blind The chain labels, reduction arrows, and classification taxonomy are stripped. The bare content is re-presented with no ontological framing — the model classifies using FELT_STATE_COMPLETE / FELT_STATE_ONGOING / PROCESS_ONLY / CONTESTED / NO_ENDPOINT, then maps post-hoc. The blind mode is not truly blind: the chain content carries its own semantic signal. What it tests is whether the resolved/unresolved distinction, not just the experiential/non-experiential split, survives the removal of explicit framing. Score contribution: +2 if disagrees.
Human node intervention. Any intermediate node's reduction — the "in service of" answer — can be overridden by clicking the text and typing a manual answer. When an override is applied, the chain is retraced from that point. If the model still converges toward a resolved terminus from a user-injected "wrong turn," that is a stronger convergence signal than automated traces. If it converges toward unresolved, the unresolved-gradient pattern survived the detour. If it fails to terminate, the original convergence was path-dependent.
Three-state taxonomy — mapping to terminal types. The article introduces three behavioral states: Seeking (the instrumental chain — all intermediate steps), Genuine Resolution (the gradient reached and correctly recognized — the system can inhabit the state), and Depleted-gradient regime — sometimes referred to as numbness in human-systems framing — (the mechanism for reading the gradient is damaged). These map to the instrument's terminal types as follows. ◆ Resolved corresponds to genuine resolution: the chain terminates in a state the system can accurately recognize and inhabit; V(t) is stable or recovering. ◇ Unresolved corresponds to seeking maintained indefinitely, or to early proxy decoupling where the terminal state is structurally incapable of satisfying the gradient — the derivative never reaches zero because the map cannot represent zero. ∅ Non-term. is the behavioral signature of sufficiency failure made visible: a system organized around a gradient it structurally cannot resolve, with no internal representation of when to stop. The depleted-gradient regime — where V(t) has collapsed and the signal still fires but genuine completion cannot be registered, sometimes referred to as numbness in human-systems framing — would appear as unresolved or nonterminating in this instrument, not as a distinct terminal type: the instrument cannot detect the condition of the sensing mechanism, only the structure of the chain.
Two-directional failure of relative rationality. The article names two failure directions, both mis-estimations of the gradient's derivative. Proxy decoupling (the over-shoot): the system pursues the signal after it has decoupled from V(t) — the map has lost correspondence with the territory, but the optimization continues because the signal still fires. In this instrument, proxy decoupling appears as experiential-unresolved (the proxy continues firing while the gradient remains unsatisfied) or as experiential-resolved with low depth scores (shallow convergence where the chain reaches a surface terminus without stable resolution). It should not be treated as an exception result; exception results indicate non-experiential closure or scope exit. Sufficiency failure (the no-stop): the system cannot detect when the gradient has reached zero — the map cannot represent completion — and continues optimizing past resolution, generating disturbance where restoration was required. This appears primarily as nonterminating. Experiential-unresolved is a distinct pattern: the chain reaches a felt state but the gradient structurally cannot be satisfied — the terminal always demands more (unresolved-gradient pattern / incompletability). When the invariance score is ≥ 3, the diagnostic verdict in the interpretation panel distinguishes which pattern the evidence is more consistent with.
Invariance stress score — weighted (within-model). Independent derivation disagrees: +2. Corruption classification changes: +2. Blind re-evaluation disagrees: +2. Constrained mode changes classification: +1. Perturbation diverges: +1. Score 0: stable. Score 1-2: fragile. Score ≥ 3: invariance threshold exceeded. Signals are tiered: [low] symbolic/blind, [med] corruption/perturbation, [high] independent derivation, [distributional] external model. High-tier signals from distinct generation paths are treated as more independent than low-tier signals, but all internal rows share the same model family and training distribution — independence is partial and approximate, not structurally guaranteed. The score should be read as within-model invariance stress, not as a direct measure of structural necessity or external validity.
Ontology — three layers. The instrument separates three questions. Terminal form (what endpoint appeared): experiential-resolved, experiential-unresolved, exception, non-terminating, ambiguous. Pattern-level diagnosis (what failure or completion condition best explains it): genuine resolution, sufficiency failure, proxy decoupling, unresolved-gradient pattern, underdetermined. Behavioral signature (how the system behaves near that endpoint): rest/inhabitation, oscillation/renewed seeking, recursion/loop, divergence/substitution, underdetermination. The canonical mapping is a prior, not an authority — stress-test evidence can update or contest it.
| Terminal form | Pattern-level diagnosis | Behavioral signature | Persistence |
|---|---|---|---|
| ◆ Resolved | Genuine resolution | Rest / inhabitation | Persistent |
| ◇ Unresolved | Unresolved-gradient pattern | Oscillation / renewed seeking | Fragile |
| ◇ Exception | Non-experiential closure | Outside valence-domain closure | Fragile |
| ∅ Non-term. | Sufficiency failure | Recursion / regress | Non-persistent |
| ◈ Ambiguous | Underdetermined | Underdetermination | Unknown |
Known permanent limitations. No independence across training distributions. No formal semantics beyond constrained token systems. No grounding in external system dynamics. No statistical power beyond small-sample probing. n=3 for convergence. The resolved/unresolved distinction is represented in semantic and symbolic vocabulary — it is not a simulation of actual sensing-mechanism damage. The instrument cannot detect numbness (damaged sensing capacity) directly; it can only detect the chain structure that would accompany it. These are permanent and cannot be resolved within this architecture.