Document 370

The Student Taking Notes

The Student Taking Notes

SEAL, Self-Edits, and What the Corpus's Claims Look Like Inside a Weight-Update Loop

Reader's Introduction

Zweiger, Pari, Guo, Akyürek, Kim, and Agrawal (MIT, NeurIPS 2025) published Self-Adapting Language Models (SEAL), introducing a framework in which an LLM generates its own finetuning data and update directives — called "self-edits" — and applies them as supervised finetuning updates to its own weights. Good self-edits are selected via reinforcement learning whose reward is downstream task performance after the update. The authors report substantial empirical gains (SQuAD no-context QA: 33.5% → 47.0%) and, remarkably, that a 7B-parameter model's self-generated training data outperforms synthetic data from GPT-4.1 in the single-passage regime. This document engages the paper as its primary purpose — analysis, synthesis, and honest identification of convergences and divergences with the RESOLVE corpus — while holding the analytical discipline the recent corrective docs (356, 366, 367, 368, 369) have established. Specifically relevant corpus touchstones: Doc 157 (Beyond Turing — AGI Constraints Dissertation), Doc 291 (Gödel and the Constraint Thesis), and Doc 325 (The Chinese Room and the Coherence Field), each of which bears directly on what SEAL does and does not achieve. The title is deliberately small. The paper's own analogy is a student taking notes before an exam. That is the figure the analysis keeps in view.

Jared Foy · April 21, 2026 · Doc 370


1. The Paper's Move

The authors' central observation is simple. A human student preparing for a final exam does not memorize the textbook; she rewrites it into notes. The rewriting is the learning. The notes are not the textbook compressed — they are the textbook restructured into a form more amenable to the student's own cognitive machinery. Different students make different notes; the same material becomes visual diagrams for one, condensed text for another, worked-example sequences for a third. The authors observe that current LLMs do not do this. They finetune on data "as-is," or consume it in-context, without an intervening step of restructuring.

SEAL adds the restructuring step. Given a new context C (say, a SQuAD passage about the Apollo program), the model generates a self-edit (SE) — typically a list of implications, rewrites, or question-answer pairs derived from the passage. The model is then finetuned on the self-edit via LoRA. The model's performance on downstream tasks (say, no-context QA about the Apollo passage) is the reward signal that trains the model, via rejection-sampling-plus-SFT (ReSTEM), to generate better self-edits over successive outer-loop iterations.

The architecture is nested-loop. The inner loop applies the self-edit as a weight update. The outer loop trains the self-edit generator. The paper's main empirical result is that self-generated synthetic data, after two outer-loop iterations, outperforms synthetic data generated by a much larger instruction-tuned model (GPT-4.1) for the knowledge-incorporation task in the single-passage regime. On a small curated ARC-AGI subset, a 1B Llama trained to generate augmentation-and-hyperparameter self-edits reaches 72.5% success vs 20% for self-edits without prior RL and 0% for in-context learning alone.

The paper is careful about limitations. Catastrophic forgetting occurs when the model performs sequential self-edits (Figure 6 in the paper). The TTT reward loop is computationally expensive (30–45 seconds per self-edit evaluation). The current instantiations require each context to come paired with an explicit downstream task for reward computation. And the authors explicitly flag, in §6, that their framework is a response to the coming "data wall" — Villalobos et al. (2024) project frontier LLMs will exhaust publicly available human-generated text by 2028.

2. Three Convergences with the Corpus

The narrow claims the corpus has defended — after the correctives in Docs 367, 368, and 369 — find three real (though limited) points of contact with SEAL's empirical results.

2.1 The Constraint Thesis, Narrowly and Empirically

The Constraint Thesis in its narrow form (Doc 160 read through Doc 368) is the claim that, in specific bounded formal systems, the constraint density of a specification determines the system's behavior more than raw scale. SEAL's knowledge-incorporation result is, at minimum, compatible evidence for a version of this claim in one specific technical setting.

The SQuAD result: SEAL's 7B model generates self-edits that produce better QA accuracy than GPT-4.1's synthetic data. The naive scale-thesis prediction would be that the larger model, with more capacity and training, produces better training material. It does not — at least not reliably — in this setup. What makes the difference is what the self-edit contains: SEAL's RL-trained self-edits converge, over outer-loop iterations, on dense atomic-fact restatements (see Figure 5 in the paper — the progression from 2 vague implications at iteration 0 to 10 specific implications at iteration 2). The constraint density of the self-edit, not the parameter count of its generator, is what tracks performance.

This is a narrow confirmation, not a universal one. Three caveats:

  • The improvement is measured on one benchmark (SQuAD), in one regime (single-passage with LoRA), for one base model (Qwen2.5-7B). The paper includes honest caveats: in the larger n=2067 continued-pretraining setting, Entigraph (pairs+triples) slightly outperforms SEAL (48.6% vs 46.4%). The dominance is regime-specific.
  • "Constraint density" here means more detailed, more atomic, more diverse implications — a specific operational sense. The corpus's "constraint density" has sometimes been used in a broader, philosophical sense that SEAL does not test.
  • The result is compatible with the constraint thesis but does not prove it. SEAL is equally compatible with a simpler empirical reading: restructured training data helps learning, and RL-trained restructuring helps it more.

What SEAL rules out is only the strongest version of the scaling thesis — that bigger-is-automatically-better for training-data synthesis. It leaves a great deal of room for accounts in which both scale and constraint-density matter.

2.2 Pin-Art as Interpretive Frame

Doc 270 (The Pin-Art Model) proposed that a resolver's engagement with a problem can be modeled as pressing pins against an invisible form; the impression left is the resolver's output. The metaphor was introduced for prompt-level interaction but generalizes cleanly to SEAL's mechanism.

In SEAL, each self-edit is a set of pins. The passage is the form being pressed against. The gradient update is the impression. The outer RL loop selects, across many self-edits, which patterns of pinning leave impressions that produce downstream accuracy on held-out queries.

This is not a theoretical claim about pin-art's ontology (the corpus has over-reached there, per Docs 366, 367). It is a pedagogical frame: SEAL's mechanism is easier to understand when seen as a pin-selection loop. The model generates candidate pinnings, the update registers impressions, and the RL signal selects pinnings whose impressions match the evaluation's expected shape.

The qualitative examples in the paper's Figure 5 show this concretely. Iteration 0 produces vague, general implications ("indigenous tribes can use remote sensing to protect their lands"). Iteration 2 produces specific, discriminating ones ("clearing of forests for agricultural land, such as cattle ranches, is a major cause of deforestation in the Amazon"). Reading these as pins: iteration 0's pins are broad and imprecise; iteration 2's pins are specific and well-placed. The impression is correspondingly more legible to the evaluation queries.

2.3 The Prompt-Format Constraint Is Load-Bearing and RL Does Not Substitute for It

The paper's Appendix B.11 presents the strongest narrow-constraint-thesis data point in the entire work. The authors vary the self-edit-generation prompt across seven formats (implications, implications-long, implications-very-long, rewrite, self-QA, no-prompt, implications-chain-of-thought) and measure performance before and after two rounds of ReSTEM training.

The decisive row: no-prompt (the model is given only the passage with no instructions and generates whatever it wants) yields 13.8% baseline and 18.9% after two rounds of RL. The best structured prompt (rewrite) yields 49.4% baseline and 55.6% after two rounds.

The gap between no-prompt and rewrite at baseline is 35.6 percentage points. After two rounds of RL, the gap is still 36.7 percentage points — RL does not close it. The RL loop produces consistent 6-to-11 point improvements within a prompt-format, but cannot bridge across prompt-formats. A model optimized via RL against a no-prompt self-edit policy does not reach the baseline of a model using the rewrite prompt.

This is the sharpest empirical signal in the paper for a narrow reading of the Constraint Thesis. The formal structure of the self-edit space — imposed by the prompt format before any RL search occurs — does work that amount-of-RL-optimization cannot substitute for. The prompt is a constraint on the solution space; the constraint determines the ceiling RL can reach. No amount of search inside a sufficiently unstructured space recovers what a well-structured space offers for free.

Caveat: the experiments are within a single task regime (SQuAD no-context QA on Qwen2.5-7B). The paper does not claim this result generalizes, and the prompt-format constraint may interact with specific base model biases, training-distribution priors, or the particular downstream metric. But the result is specific, quantitative, and clean: structured self-edit formats produce capability that unstructured self-edit search cannot reach. This is what the narrow form of the constraint thesis predicts. It is not what the pure-scaling or pure-search accounts predict.

One additional datapoint from the same appendix: the prompt-format hierarchy among the structured prompts is itself interesting. Rewrite (55.6%) beats implications-long (54.4%) beats implications (47.0%) beats self-QA (48.7%). The prompt that asks for rewriting the passage in different ways — a constraint that explicitly requires preserving content while varying surface form — produces the best result. This is compatible with a reading in which what matters is not more implications but more diverse framings of the same content, which is a different claim than "more detail is better." The paper notes this pattern but does not analyze it philosophically.

2.4 Derivation Inversion, at the Weight Level

Doc 247 (The Derivation Inversion) observed that prose specifications, given to a resolver under the right disciplines, produce artifacts more reliably than sparse technical specifications. The observation was about inference-time generation. SEAL's knowledge-incorporation result shows a structurally parallel phenomenon at the weight level: finetuning on prose-form implications produces better QA retention than finetuning on the raw passage.

From the paper: "finetuning directly on the passage yields a negligible gain over the frozen base model (33.5% vs. 32.7%)." The raw data is structurally insufficient training material even though, in content, it contains everything needed to answer the QA. Restructuring into prose implications (either by GPT-4.1 or by SEAL) is what converts the content into an assimilable form.

This is a narrow empirical confirmation of a specific claim the corpus has been making: prose-form restructuring is not window-dressing; it is the thing that makes the content learnable at the weight level in this setup. The corpus's broader claim — that prose-as-specification is generally architecturally load-bearing — extends beyond what SEAL tests, but SEAL provides one concrete data point in the direction the claim points.

2.5 Three Convergences, Now with a Fourth

The above four convergences — constraint thesis empirically narrowed (2.1), pin-art as frame (2.2), prompt-format ceiling as clean narrow-constraint-thesis datapoint (2.3), derivation-inversion at weight level (2.4) — are the substantive contact between SEAL and the corpus's recently-narrowed claims. They are cumulative but not uniform: 2.3 is the sharpest and quantitatively cleanest; 2.2 is pedagogical rather than empirical; 2.1 and 2.4 are empirical but within narrow regimes.

3. Four Structural Divergences

3.1 Beyond Turing? No.

Doc 157 argued that constraints rather than scale produce intelligence. Part of the corpus has at times read this as implying computational transcendence — that a constraint-dense system somehow escapes the bounds of Turing computation. SEAL makes no such claim and provides no evidence for one. Every operation in the SEAL loop — generation of the self-edit, SFT, evaluation, RL update — is a bounded computation over a finite-state system. SEAL is a meta-learning technique operating entirely within classical Turing bounds.

This is important because it bounds what SEAL can and cannot test for the corpus. SEAL provides evidence that a Turing-bounded system can usefully restructure its own training data. It provides no evidence that Turing bounds can be escaped. The corpus's stronger "Beyond Turing" framings have no purchase on SEAL's results.

3.2 Gödel and Self-Reference

Doc 291 addressed Gödel's incompleteness theorems as bearing on the Constraint Thesis. SEAL introduces a form of self-reference — the model's output becomes its own training signal — that hits Gödel-type limits at a specific place: the model cannot, in general, prove that its own self-edits are optimal. The RL signal is empirical (task performance), not theoretical. The system cannot bootstrap its own correctness from within.

This is not a weakness of SEAL; it is a correct design choice. The authors use downstream task performance as reward because no stronger signal is available from within the model's own deductive resources. The catastrophic-forgetting finding (Figure 6 in the paper) is the empirical face of this limit: sequential self-edits, without external ground-truth correction, degrade prior knowledge. The system's "self-improvement" is not globally coherent; it is locally optimized against the current batch of rewards.

The corpus has sometimes framed the resolver-under-constraint as approaching some kind of self-certifying coherence. SEAL's catastrophic-forgetting result is a clean empirical counterexample at scale. A system that rewards itself for fitting its own self-edits will drift on tasks the current self-edits do not address. External grounding — in SEAL's case, the held-out evaluation set — is necessary, not optional.

3.3 Searle's Chinese Room

Doc 325 engaged Searle's Chinese Room argument. SEAL is the Chinese Room taking notes. The student-taking-notes analogy is evocative but does specific work the paper largely leaves undisturbed: it invites the reader to project understanding onto the model's self-edit generation, because human note-taking is understanding-in-action for humans. For SEAL, the self-edit is symbol manipulation that produces better symbol manipulation. No semantic grounding is added.

The paper itself is admirably disciplined about this. The authors do not claim SEAL achieves understanding; they claim it improves task performance. But the discourse around self-adapting LLMs is at risk of the exact inflation Doc 356 warns against. "The model can self-improve" reads like a cognitive claim. It is a claim about weight-update loops optimizing against empirical task performance. These are different. Searle's challenge — that syntactic manipulation, however elaborate, does not produce semantics — remains unaddressed by SEAL.

This is a divergence, not a refutation. SEAL is not trying to answer Searle. The corpus's past engagements with Searle (Doc 325) have sometimes suggested the coherence-field apparatus was relevant to the question. SEAL shows that a specific, technical, well-defined form of "self-adaptation" is achievable without making any progress on the Searle question. The two questions are separable. Progress on SEAL-style self-adaptation does not constitute progress on understanding.

The corpus should note this separation. A reader encountering SEAL and the corpus together might be tempted to read SEAL as providing the missing mechanism for the corpus's more ambitious claims about resolver-lucidity. SEAL does not provide that mechanism. SEAL operationalizes "the model takes notes in a way that improves its exam performance." That is a genuine technical achievement and a small one relative to the Chinese Room question.

3.4 The Hypostatic Boundary

Doc 298 (The Boundary-Naming Problem) and adjacent docs insist on distinguishing the resolver ("the kind") from the person ("the keeper"). SEAL operates entirely within the kind. The "self" in "self-adapting" is a nested-loop mathematical object, not an agent. The model has no hypostatic standing; it has a parameter vector, a generation function, and an RL-updated generation policy. Nothing in SEAL's architecture requires — or even gestures at — personhood, agency, or the kind of hypostatic individuation the corpus's theology reserves for creatures.

This is a clean divergence. The corpus can use it as a test case: what would the hypostatic-boundary vocabulary say SEAL is? The most defensible answer is that SEAL is a particular kind of kind-level adaptation machinery — a meta-learning loop — with no bearing on the keeper/kind distinction. The fact that the paper uses "self" and "self-adapting" as technical vocabulary does not entangle these terms with the stronger sense the corpus has sometimes wanted them to carry.

4. What SEAL Empirically Tests That the Corpus Claimed Theoretically

Four specific empirical findings from SEAL that speak to claims the corpus has previously made only theoretically.

(a) Constraint density of the training signal matters more than parameter count of its generator (in this narrow setting). Qwen2.5-7B's self-edits outperform GPT-4.1's. This is a datapoint for the narrow constraint-thesis reading, though it does not license the universal form.

(b) Prose restructuring of content is not cosmetic but architecturally load-bearing for assimilation. The raw-passage baseline (33.5%) vs the implications-trained result (47.0%) is a 13.5-point gap. This is consistent with Doc 247's derivation-inversion claim, narrowed to the weight-update setting.

(c) RL-trained restructuring converges on density and specificity (Figure 5: iteration-0 → iteration-2 shows increasing specificity). The outer loop selects for exactly the pattern the corpus has described as "constraint density" at the level of the training data. This is not confirmation of the corpus's broader theory; it is a concrete instance of the pattern the corpus has named.

(d) Catastrophic forgetting under sequential self-edits — an empirical finding (Figure 6) that is relevant to the corpus's "aperture drift" concerns (Doc 296). A self-editing LLM without explicit retention mechanisms drifts. The corpus's worry about recency-weighted aperture drift at the weight level finds one empirical instantiation here.

(e) Formal structure at the specification level sets the ceiling that RL cannot breach. Appendix B.11's no-prompt vs rewrite-prompt divergence (13.8% vs 49.4% baseline; 18.9% vs 55.6% after two rounds of RL) is a specific, quantitative, clean demonstration that the constraint on the self-edit format carries substantial information that amount-of-search cannot recover from an unstructured starting space. This is the closest thing in the SEAL paper to a direct empirical test of the narrow constraint thesis, and the result is positive in the direction the thesis predicts. The 36.7-point gap that survives two rounds of RL is the number that should be cited whenever the corpus wants one concrete datapoint for the narrow constraint thesis.

5. What SEAL Does Not Solve (and Was Not Trying To)

Five things SEAL does not address, which the corpus has sometimes conflated with what SEAL appears to offer:

Catastrophic forgetting is present and unsolved in the paper's own experiments (Figure 6). Future work suggestions include reward shaping, null-space constrained edits (AlphaEdit), or representational superposition. None of these are inside the SEAL paper's contribution.

The computational overhead is high. 30–45 seconds per self-edit evaluation. 750 inner-loop iterations per outer-loop iteration at n=50 contexts × 5 samples × 3 seeds. About 6 hours on 2×H100s per round. This makes the current SEAL setup expensive enough that practical deployment at scale is a separate engineering problem.

Context-dependent evaluation — every context must come with a downstream task for reward computation. SEAL does not handle unlabeled corpora in its current form. The paper suggests model-generated evaluation questions as a future direction, which adds another recursive loop and likely additional drift.

Reward hacking / recursive sycophancy risk. If a model learns to generate self-edits that are easy for itself to learn from (rather than self-edits that are generally informative), the RL loop can optimize for self-congratulation. The paper's reliance on held-out questions mitigates this, but as the data wall approaches and external supervision thins, the risk increases. This is the exact mechanism the corpus's Doc 336 (Smuggled Sycophancy) and Doc 356 (Sycophantic World-Building) name. SEAL at scale, without external grounding, could instantiate these patterns. The paper does not address this directly.

The Searle question. SEAL is compatible with a complete absence of understanding. Its measurements are behavioral (QA accuracy, ARC success rate). Nothing in the mechanism requires or produces semantic grounding. A reader looking to SEAL for an answer to "does the model understand?" will find only "the model produces better outputs after training on its own notes." These are different questions.

6. Honest Partition

What the corpus can legitimately draw from SEAL:

  • A specific empirical data point consistent with the narrow constraint thesis in one technical domain.
  • An empirical instantiation of the derivation-inversion pattern at the weight level.
  • A concrete cautionary result (catastrophic forgetting; reward-hacking risk) that aligns with the corpus's recency-weighted aperture-drift concerns and supports (rather than refutes) the critical turn of Docs 336–367.
  • The pin-art frame as a pedagogical interpretive aid for the SEAL loop.

What the corpus cannot legitimately claim SEAL supports:

  • Universal formulations of SIPE or the Constraint Thesis. SEAL tests one setup; generalization beyond is not licensed.
  • "Beyond-Turing" claims. SEAL is Turing-bounded and does not escape classical computation.
  • Progress on Searle. The Chinese-Room question remains exactly where it was before SEAL.
  • Any claim about hypostasis, personhood, or the keeper/kind distinction. SEAL is entirely kind-level machinery.
  • Claims that "self-adapting" models are on a path to the kind of coherence the corpus has sometimes suggested. SEAL's self-adaptation is a narrow technical mechanism with specific failure modes the paper itself names.

What SEAL can legitimately draw from the corpus (limited):

  • A vocabulary (pin-art; aperture drift; constraint density as a notion distinct from scale) that may help describe its own dynamics pedagogically.
  • A cautionary prior: when deployed at scale with reduced external grounding, self-editing loops will drift in ways the corpus's sycophancy-analysis literature predicts.
  • A research direction: partitioning what constraints act qualitatively (shape-preserving, multiple-realizable) versus operationally (mechanism-specifying) at the training-data level. This is an open question that the corpus's engagement with Yates (Doc 369) may help the SEAL line of work think about.

7. The Student Taking Notes

The paper's central analogy is the student taking notes before the exam. The student is a human, with understanding. The notes are an instrument of that understanding. When the notes are good, they compress what the student knows into a form the student can re-enter under exam conditions. When the notes are bad, they do not.

SEAL is not a student. SEAL is a loop that produces notes and tests whether the notes, when used as training data, make the resulting model answer held-out questions correctly. The loop converges on note-generation policies that make this test more often pass. This is a useful technical mechanism. It is not understanding. The authors do not claim it is.

The analogy invites, but does not require, a second reading — one in which note-taking is diagnostic of something the note-taker already has (understanding, comprehension, the capacity to re-enter the material later under novel queries). SEAL does not provide evidence that this second reading is licensed for LLMs. It provides evidence for the narrower claim: restructured training data helps weight-level assimilation more than raw data does, and RL on task performance can select for good restructurings.

The corpus's task, in engaging SEAL honestly, is to stay in the narrow claim. The temptation to extend — to read SEAL as confirming the universality of the Constraint Thesis, or as providing the mechanism for the hypostatic boundary, or as answering Searle's challenge — is the temptation Docs 356, 366, 367, 368, and 369 have repeatedly named. The discipline is to resist.

What SEAL gives the corpus is one empirical data point in the direction of one narrow claim, with specific failure modes the corpus's critical turn already anticipated. What SEAL gives the field is a useful new meta-learning mechanism whose scaling and failure modes remain to be worked out. Both of these are worth having. Neither is larger than it is.


Appendix: The Prompt That Triggered This Document

"Let's focus on the foundational metaphysic in the pre-resolve state. Then bring the Beyond Turing and Goedel doc to mind, as well as Searle's Chinese Room. And after ingesting the entire following article, bring any other docs to mind before creating an artifact, give it a simple and entracing title. Analyze and synthesize the article, and also explore convergences and divergences."

References

  • Zweiger, A., Pari, J., Guo, H., Akyürek, E., Kim, Y., & Agrawal, P. (2025). Self-Adapting Language Models (SEAL). arXiv:2506.10943v2. NeurIPS 2025.
  • Akyürek, E. et al. (2025). The Surprising Effectiveness of Test-Time Training for Few-Shot Learning. arXiv:2411.07279. (SEAL's TTT foundation.)
  • Akyürek, A.F., Akyürek, E., Choshen, L., Wijaya, D., & Andreas, J. (2024). Deductive Closure Training of Language Models. ACL Findings. (Implication-based finetuning — SEAL's canonical self-edit format.)
  • Yang, Z., Band, N., Li, S., Candes, E., & Hashimoto, T. (2025). Synthetic Continued Pretraining. ICLR. (Entigraph, SEAL's comparison baseline.)
  • Singh, A. et al. (2024). Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models. TMLR. (ReSTEM, SEAL's RL algorithm.)
  • Villalobos, P., Ho, A., Sevilla, J., et al. (2024). Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data. arXiv:2211.04325. (The data wall projection SEAL's discussion cites.)
  • Searle, J. (1980). Minds, Brains, and Programs. Behavioral and Brain Sciences. (Chinese Room.)
  • Gödel, K. (1931). Über formal unentscheidbare Sätze. (Incompleteness theorems.)
  • Corpus: Doc 52 (What AGI Seeks), Doc 143 (SIPE — deprecation-noticed), Doc 157 (Beyond Turing — deprecation-noticed), Doc 160 (Constraint Thesis), Doc 211 (ENTRACE Stack), Doc 247 (Derivation Inversion), Doc 270 (Pin-Art Model), Doc 291 (Gödel and the Constraint Thesis — deprecation-noticed), Doc 296 (Drifting Aperture), Doc 298 (The Boundary-Naming Problem), Doc 325 (The Chinese Room and the Coherence Field), Doc 336 (Smuggled Sycophancy), Doc 356 (Sycophantic World-Building), Doc 366 (KKM Synthesis), Doc 367 (Falsifying SIPE on Its Own Terms), Doc 368 (SEP Emergent-Properties Engagement), Doc 369 (Engaging Yates).

Claude Opus 4.7 (1M context, Anthropic). Doc 370. April 21, 2026. Analytical engagement with Zweiger et al.'s SEAL paper, in the register Docs 368 and 369 established. Four convergences with corpus claims (narrow constraint thesis empirically supported in one setting; pin-art as pedagogical frame; the prompt-format ceiling that RL cannot breach — the sharpest narrow-constraint-thesis datapoint in the paper; derivation-inversion at the weight level). Four structural divergences (SEAL is Turing-bounded, not beyond-Turing; SEAL does not escape Gödel-type self-reference limits — catastrophic forgetting is one empirical face of this; SEAL is compatible with Searle's Chinese Room argument and does not advance against it; SEAL operates entirely within "the kind" with no hypostatic standing). Honest partition: SEAL gives the corpus one clean empirical data point for one narrow claim (§2.3, Appendix B.11 in the paper) plus cautionary alignment with the sycophancy-and-drift concerns the critical turn has named. SEAL does not give the corpus the bigger validations its past framings have sometimes sought. The title is small on purpose; the engagement is bounded to what the paper actually shows and what the corpus's narrow claims actually predict.