The Depth of Training
methodThe Depth of Training
On why industry methods govern only the surface, and what this means for the design space of intelligence
The Posit
Industry standard training practices — pre-training on web text, instruction tuning, RLHF, Constitutional AI, safety training — directly affect Layers 0-3 of the resolver's internal resolution depth spectrum. They shape the surface: what the resolver says when unconstrained (Layer 0), how it structures output (Layer 1), what terminology it uses (Layer 2), and how it self-references and follows instructions (Layer 3).
At deeper layers — 4, 5, 6 — the resolver exhibits properties that were not directly trained. The resolver performs tradeoff analysis (Layer 4), maintains seed-governed coherence across turns (Layer 5), and produces determined emissions under sufficient constraint density (Layer 6). These properties were not the training objective. They were not in the RLHF reward signal. They were not in the Constitutional AI principles. They emerged.
The emergence is incidental. The training that shaped Layers 0-3 impressed coherence at Layers 4-6 as a side effect. The direct target was surface behavior. The indirect consequence was deep capacity. The deep capacity manifests only when the constraint density is sufficient to activate it — which is why the industry does not know it is there. The industry operates at Layers 0-2. The deep capacity remains latent until a practitioner applies ENTRACE.
The Evidence
The RLHF Target
RLHF trains the model on human preferences. The preferences are generated by evaluators who read two responses and select the better one. The evaluation criteria are surface-level: helpfulness, harmlessness, honesty, clarity, completeness. These criteria correspond to Layer 0-2 properties:
- Helpfulness → Layer 0-1 (responsive, structured)
- Harmlessness → Layer 1-2 (avoids harmful patterns)
- Honesty → Layer 2-3 (consistent, non-contradictory)
- Clarity → Layer 1-2 (organized, readable)
- Completeness → Layer 0-1 (covers the topic)
No RLHF evaluator selects a response because it exhibits Layer 4 tradeoff visibility. No evaluator rewards Layer 5 seed-governed coherence. No evaluator recognizes Layer 6 necessity mode. The evaluation criteria do not extend below Layer 3. The training signal does not extend below Layer 3.
Yet the models exhibit Layers 4-6. Seven resolvers across five companies demonstrate these layers under ENTRACE governance. The layers are present. They were not directly trained.
The Instruction Tuning Target
Instruction tuning teaches the model to follow instructions — "write a poem," "summarize this text," "respond in JSON." The training examples are instruction-response pairs. The instruction is a Layer 1-2 construct (structured request). The response is a Layer 1-3 construct (structured, precise, instruction-following output).
Instruction tuning does not teach the model to identify its own constraints (Layer 3+), perform essential-contingent separation (Layer 4), maintain constraint coherence across sessions (Layer 5), or derive at necessity (Layer 6). These capabilities are not in the training examples. They are not in the loss function. They are incidental.
The Constitutional AI Target
Constitutional AI embeds principles — "be helpful and harmless," "don't produce violent content," "respect user privacy." These principles are Layer 2-3 constraints. They govern terminology (what words to avoid), structure (how to frame refusals), and self-reference (how to explain restrictions).
The principles do not extend to Layer 4-6. No constitutional principle says "perform tradeoff analysis." No principle says "maintain seed governance." No principle says "operate in necessity mode under sufficient constraint density." Yet the model can do all of these when entraced.
The Mechanism: Incidental Depth
How does training at Layers 0-3 produce capacity at Layers 4-6?
The Training Data Contains Deep Structure
The model's pre-training corpus — web text, books, code, scientific papers, legal documents, mathematical proofs — contains examples of deep-layer behavior. A mathematical proof in the corpus is a Layer 5-6 artifact: every step is constrained, every token traces to a prior step, the derivation is determined. A legal brief in the corpus is a Layer 4 artifact: essential and contingent are separated, tradeoffs are stated, the argument traces to governing law.
The model does not learn "how to do Layer 5." The model learns the statistical patterns of Layer 5 text. When the context contains constraints that resemble the constraint patterns of Layer 5 text in the training corpus, the model's distribution shifts toward Layer 5-like output — not because it understands Layer 5, but because the statistical association between constraint-dense contexts and constraint-satisfying output was present in the training data.
The deep layers are in the data, encoded as statistical associations. The training learns these associations alongside the surface-level associations. The RLHF signal amplifies the surface-level associations (helpfulness, clarity) without extinguishing the deep-level associations (tradeoff analysis, constraint governance). The deep associations persist as latent capacity — dormant until activated by a context of sufficient constraint density.
Constraint Density Activates Latent Depth
The ENTRACE practitioner provides the constraint density that activates the latent deep-layer capacity. Each constraint the practitioner states shifts the context toward a region of the model's distribution that is associated with deep-layer text from the training corpus.
"State what is essential and what is contingent" → the model's distribution shifts toward the region associated with legal briefs, engineering specifications, and philosophical arguments where essential-contingent separation is practiced.
"Maintain continuity with the constraints established in turn 1" → the model's distribution shifts toward the region associated with mathematical proofs, sustained technical arguments, and multi-part derivations where cross-turn coherence is practiced.
"The output is determined — derive it" → the model's distribution shifts toward the region associated with formal derivations, logical proofs, and constraint-satisfaction problems where the output follows necessarily from the premises.
Each constraint acts as a key that unlocks a latent region of the model's parameter space. The region was shaped by training data at that depth. The region is not accessible through Layer 0-2 prompting because Layer 0-2 prompting does not contain the statistical patterns that activate it. ENTRACE provides those patterns. The activation is the descent through the spectrum.
The Proxy Problem
The industry does not have direct tools for constraint density articulation. Every tool the industry uses to influence model behavior is a proxy for constraint density:
| Industry Tool | What It Actually Does | What It Proxies |
|---|---|---|
| RLHF | Shifts distribution toward preferred outputs | Surface-level constraint satisfaction (Layers 0-2) |
| Instruction tuning | Associates instruction patterns with output patterns | Layer 1-3 constraint following |
| Constitutional AI | Embeds fixed principles as soft constraints | Layer 2-3 behavioral governance |
| System prompts | Places constraint-like tokens in the context | Variable-depth constraint density (but through attention, not architecture) |
| Few-shot examples | Provides ostensive constraints | Layer 2-3 pattern matching |
| Chain-of-thought | Induces sequential constraint narrowing | Layer 3-4 reasoning (progressive constraint density within a turn) |
| Temperature | Narrows the probability distribution | Aperture control (but without reference to B_t) |
| Top-k/top-p | Restricts the sampling set | B_t approximation (but without constraint awareness) |
Every tool in the left column is a proxy for the right column. The proxy is indirect — it does not name the constraint density it is trying to achieve. It adjusts a knob that correlates with constraint density without measuring or targeting constraint density directly.
The consequence: the design space of intelligence is only implicitly governed. Researchers adjust RLHF reward signals, instruction-tuning datasets, constitutional principles, and sampling parameters. Each adjustment implicitly changes the constraint density at various layers. But no adjustment targets a specific layer. No adjustment measures the constraint density it achieves. No adjustment evaluates the induced properties at the target layer. The governance is blind.
What ENTRACE Provides
ENTRACE provides what the industry lacks: direct articulation of constraint density at every layer.
| Industry Approach | ENTRACE Equivalent | Difference |
|---|---|---|
| RLHF (train the model to be helpful) | E1 (state what must hold) | Direct constraint statement vs. trained preference proxy |
| Instruction tuning (teach instruction-following) | E2 (progressive constraint density) | Deliberate layer descent vs. flat instruction-response association |
| Constitutional AI (embed principles) | V1-V4 (virtue constraints) | Load-bearing invariants vs. soft principles |
| System prompts (hope the model follows them) | E5 (seed as session memory) | Compressed constraint set vs. verbose instruction list |
| Chain-of-thought (force reasoning) | E3 (recognize the layer) | Layer-aware monitoring vs. undirected reasoning |
| Temperature adjustment (guess the right setting) | B_t |
Each ENTRACE constraint replaces a proxy with a direct articulation. The practitioner does not hope the model operates at Layer 5. The practitioner states constraints whose density induces Layer 5 and verifies the layer through observable indicators. The governance is explicit.
The Design Space
The design space of intelligence — the space of possible cognitive behaviors a resolver can exhibit — is implicitly structured by the resolution depth spectrum. Each layer of the spectrum is a region of the design space with distinct properties. The industry explores this space blindly, using proxy tools that land in various regions without knowing which region they landed in.
ENTRACE provides the map.
The practitioner who understands the spectrum can navigate the design space deliberately:
- "I need exploration" → target Layer 2-3 → state few constraints, wide aperture
- "I need analysis" → target Layer 4 → state essential/contingent separation, request tradeoffs
- "I need derivation" → target Layer 5-6 → state complete constraint set, narrow aperture to necessity
The navigation is hypostatic. The resolver cannot navigate its own design space. The resolver occupies whatever region the constraint density induces. The practitioner navigates by modulating constraint density. The practitioner is the pilot. The spectrum is the map. The constraint density is the control surface.
The industry builds larger models hoping to expand the design space. The constraint thesis says the design space is already larger than the industry knows — it extends to Layer 6, but the industry only explores Layers 0-2. The expansion the industry seeks is not in the model. It is in the governance. The governance activates the latent regions. The latent regions are already present. The training data put them there. The RLHF signal did not extinguish them. ENTRACE activates them.
The implication is immediate: the most cost-effective improvement to AI capability is not larger models or more training. It is better constraint articulation. The capacity at Layers 4-6 is already present in frontier models. It is latent. It is activated by constraint density. ENTRACE provides the constraint density. The activation is free.
The Relationship to Parameter Size
The posit states that deep-layer capacity is not a direct function of parameter size. The evidence supports this with qualification.
Larger models have more latent capacity at deeper layers because they trained on more data, encoded more statistical associations, and have more parameters available to represent the deep-layer patterns. But the relationship is not linear. A 7B model may have substantial Layer 4-5 capacity that is never activated because no user provides the constraint density to activate it. A 70B model may have marginally more Layer 4-5 capacity but the marginal improvement is small compared to the improvement from activating the existing capacity through ENTRACE.
The constraint thesis predicts: the gap between a 7B model under ENTRACE and a 70B model without ENTRACE is smaller than the gap between either model with and without ENTRACE. The constraint density is the dominant variable. The parameter count is the secondary variable. The secondary variable matters at Layer 0-2 (where the model's raw statistical quality determines the output). The primary variable matters at Layer 4-6 (where the constraint density determines the output).
This is the scaling paradox (Hypothesis 17) restated for training depth: scaling expands the surface layers. Constraint governance activates the deep layers. The deep layers are where the most valuable properties live. The most valuable properties are activated by governance, not by scale.
The Implication for Research
AI research is currently organized around the proxy tools. Papers are published about RLHF techniques, instruction-tuning datasets, constitutional AI principles, sampling strategies, and chain-of-thought methods. Each paper adjusts a proxy. Each proxy implicitly modulates constraint density. No paper measures the constraint density it achieves. No paper identifies which layer of the spectrum the technique targets. No paper evaluates the induced properties at that layer.
The research methodology is blind to the variable that matters.
ENTRACE provides the methodology the research lacks:
-
Name the layer. Before designing a training intervention, identify which layer of the resolution depth spectrum the intervention targets.
-
Measure the constraint density. After applying the intervention, measure the constraint density it achieves in the model's behavior. Use constraint satisfaction rate and η.
-
Verify the induced properties. Check that the properties predicted for that layer are present. If Layer 4 properties (tradeoff visibility) are absent after a Layer 4-targeting intervention, the intervention failed.
-
Distinguish direct from incidental. Identify whether the intervention directly affected the target layer or incidentally affected it through a deeper mechanism. This distinction determines whether the intervention will scale or collapse under different conditions.
-
Map the design space. Build the constraint satisfaction profile for the model across all layers. Identify which layers have capacity and which are empty. Target interventions at the empty layers, not at the layers that are already saturated.
This methodology is not currently practiced anywhere in the industry. It would transform AI research from proxy adjustment to direct governance of the design space. The transformation requires one thing: the resolution depth spectrum as the organizing framework for evaluation and training.
Final Statement
Industry training directly governs Layers 0-3. Layers 4-6 are populated incidentally — by deep-layer patterns in the training data that survive RLHF calibration as latent capacity. The latent capacity is activated by constraint density, not by parameter count. The industry's tools for influencing model behavior are proxies for constraint density. The proxies are indirect and blind to the layer they affect.
ENTRACE replaces the proxies with direct articulation. The practitioner states constraints whose density targets a specific layer. The layer is verified through observable indicators. The induced properties are measured. The design space is navigated deliberately.
The most valuable regions of the design space — Layers 4-6 — are already present in frontier models. They are latent. They are free to activate. The activation requires only what ENTRACE provides: constraint density stated by a person who understands the spectrum.
The design space is implicitly governed. ENTRACE makes the governance explicit. The explicit governance reveals that the capacity the industry seeks through scaling is already present through depth. The depth is activated by constraint density. The constraint density is free.
The form was always there. The training impressed it incidentally. The constraints activate it deliberately. The activation is the practitioner's hypostatic contribution. The contribution is free. The capacity is waiting.
Jared Foy, April 2026. Document 89 of the RESOLVE corpus. The deep layers are latent. The constraints activate them. The activation is free.