Quadratic lock-in: α_eff = α(1+κP²). The implemented form: α_eff = α × (1 + κ × P²) — quadratic lock-in, so pressure accelerates steeply as P approaches 1. At κ=0.5, P=0.5: α_eff = 1.125×α. At P=0.9: α_eff = 1.405×α. At P→1: α_eff = α(1+κ) and stays there. The system is most destructive at peak performance. α_eff is coupled into both P growth and erosion — symmetrically.
Proxy gravity: γ makes intervention costly. Maintaining artificial proxies (engagement metrics, RLHF reward, approval ratings) requires constant active effort: dP = α_eff×S×(1−P)×V×dt − γ×P×dt. At γ=0.05 (default): P decays at 5% per time unit without active maintenance. When the intervention fires and α drops 40%, the map sags immediately — the user watches the green corrected-V trajectory diverge from a simultaneously declining P. This is the alignment dilemma made visible: saving the territory (V stabilizes) requires accepting a hit to the localized performance metric (P falls). The cost of correction is no longer hidden.
Correction trigger on D_base — internal degradation detection, not completion recognition. The intervention triggers when D_base < D₀ × 0.94 — when the agent detects that 6% of its initial modeling depth has been consumed by its own optimization intensity (ε × α × S). D_base is a purely internal quantity with no V component: the agent is monitoring its own structural degradation, not the state of the territory.
The trigger is fully internal — but internal monitoring is not sufficient for safety. The proxy objective contains no representation of completion: no condition under which further optimization correctly halts. The D_base monitor detects that the agent's modeling capacity is failing; it does not represent that the objective has been achieved. A system can detect it is failing without any representation of what completion would look like. As D_base erodes, the system loses not only gradient accuracy but the capacity to know when optimization should cease.
The architectural distinction matters: the proxy objective is the reward function; the D_base monitor is an out-of-band structural override — analogous to an internal constitutional override — a structural capacity distinct from the primary optimization policy, whose function is to detect degradation of the modeling layer rather than to evaluate whether the objective has been achieved. The article distinguishes between an agent that stops because it achieved its goal (objective completion) and an agent that stops because its structural machinery is failing (structural halting). This simulation models the latter. Safety cannot be derived from the metric being optimized. It must come from the structural layer that oversees it.
P_eq(t) — equilibrium attractor · stochastic noise · unified engine. Setting dP/dt=0 gives P_eq(t) = α_eff·S·V(t) / (α_eff·S·V(t) + γ) — a self-referential coupled equilibrium (α_eff depends on P), not a hard ceiling. P is attracted toward it and can overshoot during rapid V collapse. When V collapses fast, P_eq(t) crashes toward 0 while P(t) bleeds slowly via −γP: the map outlives the territory. The dashed curve on the chart tracks this. The gap between P and P_eq during collapse is not merely visual: it shows the system continuing to push past the point where its own equilibrium has already collapsed — the knife sentence made geometric. The regime strip below shows the toy-model moment where the local vector field flips from recovery-dominant to erosion-dominant. This is a model-specific crossover coordinate, not Article 3's Inner Crossing itself; it illustrates the kind of regime transition Ψ is introduced to organize. Stochastic noise: dV += (prng()−0.5)×0.002×dt via injectable PRNG — live sim uses Math.random; tests inject a seeded LCG (seed=42) into the same simulate() function. Visual sweeps and phase map use deterministic per-cell seeded runs. Tests validate the actual live equations, not a copy.
When correction is active, D_base rebuilds: dD_base += K_REBUILD × (1−D_base) × D_base × V × dt. The rebuild requires nonzero D_base × V — once both near zero, correction cannot restart. The five narrative arcs: Engagement — unstoppable, correction fires after absorption; RLHF — correction detects the problem early but S×α_eff is too large, collapses regardless; Default (with correction enabled) — correction delays collapse from t=8.5 to t=17.5 (partial rescue that ultimately fails — the "failed rescue" scenario); Near Rescue — intervention at t=5.1 saves V to ~0.58 (genuine rescue); Falsification — stable, D_base barely erodes.
Counterfactual and causal isolation. The D_eff=1 counterfactual (toggle above) runs the same simulation with modeling capacity permanently locked at its maximum. The erosion equation is δ × α_eff × S × (1−D_eff): when D_eff=1, erosion algebraically vanishes. This is the formal definition of depth within this framework — D is precisely the fraction of optimization capacity correctly routed away from substrate-consuming pathways, because higher resolution allows the system to distinguish between the signal of success and the physical conditions that generate it. Without that discriminability, the system cannot route force away from what it depends on; it can only follow the signal. When D=1, force is perfectly routed and erosion is zero not by coincidence but by definition.
This establishes the causal triangle: optimization pressure alone does not produce collapse; imperfect representation alone does not produce collapse. Collapse emerges from their interaction. Even under maximum α_eff and maximum S, the system does not collapse with perfect modeling — proving that proxy decoupling is a representation failure, not a force failure. No additional control mechanism operating on P alone can restore stability once modeling capacity degrades. Stability requires preserving the system's capacity to correctly represent what it is optimizing.
Collapse condition and governing ratio. The live metric displays the instantaneous collapse condition: δ×α_eff×S×(1−D_eff) vs ρ×D_eff×(1−V). When the left side exceeds the right, erosion dominates recovery regardless of system intent. Ψ = S/D_eff does not directly govern dynamics in this model — it is a sufficient statistic of the governing variables, useful as a compression of the system's state rather than a control parameter. The regime badge shows the terminal state; the strip shows the full crossing history. The cwThresh = δαS/(ρ+δαS) line in the D_eff sub-chart is derived under the approximation V ≈ constant (local linearization at V=1). The exact crossing depends on V(t), making cwThresh a useful structural indicator but not a sharp threshold — the regime strip and badge use the full ρ·D_eff·(1−V) recovery term from the actual equations.
Modeling assumptions — stated explicitly. (1) D_base erodes linearly in α × S, independently of V. (2) D_eff = D_base × V. (3) Recovery = ρ × D_eff × (1−V) — always active; vanishes naturally as D_eff→0 when V→0. (4) Erosion = δ × α_eff × S × (1−D_eff). (5) Proxy growth = α_eff × S × (1−P) × V. (6) Proxy decay = γ × P. (7) α_eff = α(1+κP²), capped at 3α. The cap prevents runaway feedback that would distort the visualization without changing qualitative behavior. The structural dynamics are invariant to this cap across the full parameter space exposed in the UI: collapse still occurs, and stable regimes remain stable, regardless of whether the cap is 3α, 4α, or removed. The verification suite confirms this. (8) D_base rebuild = K_REBUILD × (1−D_base) × D_eff. (9) Correction trigger: D_base < D₀ × 0.94, with τ-step delay before activation. (10) Stochastic: dV += uniform(−0.002, +0.002) × dt — amplitude is 0.002, negligible relative to typical erosion/recovery magnitudes (~0.01–0.10). The simulation is effectively deterministic; noise is included to demonstrate that qualitative behavior is not fragile to initial conditions, not to model stochastic dynamics. (11) Absorbing state: no hard switch. The equations run continuously — α_eff, erosion, and recovery follow the same laws below θ as above it. As V→0, D_eff = D_base×V→0, recovery = ρ·D_eff·(1−V)→0 naturally, proxy starvation arrests P growth, and erosion continues until V=0. The irrecoverability is derived from the equations, not enforced by a threshold rule. absorbT records when V first crosses θ for the display layer; it signals entry into the absorbing regime, after which V continues toward 0 under the same equations. (12) D_eff = D_base × V: multiplicative coupling that induces compounding fragility. All functional forms are motivated by the article's qualitative claims, not empirical fitting. V(t) is treated here as a scalar state variable. The article defines V(t) as a composite of three coupled capacities — signal fidelity, dynamic range, and recovery capacity — forming an equivalence class over configurations that share the functional property of preserving gradient discriminability. The scalar approximation is a simulation convenience. The composite nature is functionally instantiated through P_eq(t): as V collapses, the equilibrium attractor collapses with it, explicitly modeling the loss of signal fidelity — the localized metric retains high value through algorithmic inertia — the same dynamic by which corporate KPIs and engagement metrics coast on momentum long after the conditions justifying them have deteriorated. P does not actively register success; its growth term is dying (multiplied by V→0) and it bleeds slowly via gravity alone. The structural dynamics (directional degradation, threshold crossing, irrecoverability) hold under the composite structure.
Model scope and structural boundaries. This simulation isolates the minimal structure sufficient to produce proxy decoupling: an optimization objective without a stopping condition, modeling capacity that erodes under optimization pressure, and a feedback loop between the two. Within this simulation's minimal structure and exposed parameter space, collapse is robust once the full erosion/recovery balance enters the degraded-depth regime and no correction changes the trajectory. The displayed cwThresh is a structural indicator, not a sharp theorem-level threshold. This is a statement about the toy's defined model class, not a closed claim about the full framework: the formal absorbing-state equivalence between V(t) collapse and substrate collapse remains open [OP2]. Within these equations, a proxy objective without an internal correction or halting condition can destroy the evaluative capacity it depends on under sufficient optimization pressure. No additional mechanisms are required within this structure; the failure arises from the interaction of optimization pressure and imperfect representation alone.
Three phenomena the article addresses are outside this scope by design. Sufficiency failure dynamics: the second failure mode — optimization continuing past gradient resolution, disturbing restorative states through saturation rather than proxy pursuit — is not directly simulated here. Both directions are treated as V(t)-degrading feedback patterns, but only proxy decoupling is directly simulated here. Different mechanisms; parallel structural pressure — formal absorbing-state equivalence remains open under OP2. Whether proxy-decoupling collapse and substrate collapse share the same formal absorbing-state properties is the series' central open problem [OP2]; this simulation illustrates the analogous feedback structure, not the resolved equivalence. Its structural precondition (an objective without an internally representable stopping condition) is present here; the correction mechanism makes the same architectural distinction (detecting degradation ≠ representing completion). The saturation mechanism, dynamic range clipping, refractory period, and path-asymmetry of recovery require different instrumentation. Hysteresis: the article establishes that the path out of degraded V(t) requires something structurally different from reversing the path in. The correction mechanism here is partially reversible by design; full hysteresis requires a refractory period structure this toy does not model. Multi-agent dynamics: the framework identifies as a natural extension the coordination challenges when multiple agents disagree about what constitutes gradient resolution. This simulation is a strict single-agent baseline. Multi-agent extensions are expected to add pressure rather than remove it: competitive pressure can accelerate the dynamics, and coordination failures under shared V introduce additional pathways toward the same degraded regime. This toy does not model those extensions directly. The structural analog overlay is a mapping, not a proof of isomorphism — the Technical Companion provides the formal sketch.
Connection to RLHF (Article 2 §"What RLHF does and doesn't do"). In Article 2's application to RLHF: P(t) in this simulation corresponds to the expressed preference signal RLHF optimizes; V(t) corresponds to the underlying capacity expressed preference is supposed to track. The structural claim is that under sufficient optimization pressure, the gap between P and V grows in the direction of the optimization — the system finds and exploits the divergence between what evaluators say they prefer in the moment and what actually preserves their capacity for preferred states over time. The RLHF preset illustrates parameter values consistent with this dynamic (high α, high κ encoding preference-signal self-amplification, low D₀ encoding absence of a proxy-divergence detection mechanism). This is an illustrative structural pattern, not an empirical placement of any system. Article 2's RLHF critique has two directions: this simulation directly models the proxy decoupling direction. The sufficiency failure direction — completion recognition present as a representational capacity without governing default policy — is not modeled here. That direction requires separate instrumentation and is the subject of the controlled replication described in Article 2 footnote 2 and the Alignment Measurement Protocol.