A Bayesian Analysis of Isomorphism-Magnetism: Formalization Informed by the Agarwal–Dalal–Misra Program
frameworkA Bayesian Analysis of Isomorphism-Magnetism: Formalization Informed by the Agarwal–Dalal–Misra Program
What this document does
The corpus concept isomorphism-magnetism (Doc 241) names a gravitational pull: new generated material tending to occupy regions of embedding space already inhabited by prior corpus material. Doc 454's UMAP analysis made this geometrically visible. Doc 442 and Doc 443 connected it to the coherentist-drift risk. The concept has, however, been used intuitively rather than formalized.
This document formalizes it inside a Bayesian model, grounded in Dr. Vishal Misra's actual published work rather than the corpus's prior second-hand citations. A web survey of the primary sources — the Dalal & Misra 2024 "Beyond the Black Box" paper and the two Agarwal–Dalal–Misra 2025 papers on Bayesian transformer geometry — supplies the real substrate. On top of that substrate I state a proposition relating corpus-self-ingestion to posterior concentration, sketch its argument, and identify which conditions would falsify the framing.
The formalization is at Doc 445's π-tier: plausibility-grounded, truth-untested. It is intended as a candidate for higher-tier audit, not as a verified theorem.
What the corpus means by isomorphism-magnetism
The working description, synthesizing uses across Doc 241, Doc 438 §6, Doc 441 §5, Doc 442 §3.5, Doc 443, and Doc 454:
Isomorphism-magnetism is the tendency of newly generated material produced under corpus-conditioning to be pulled toward semantic regions occupied by prior corpus material, independent of whether the pull tracks external evidence.
Two features of the definition matter. First, it is a property of generation under corpus-conditioning — not a claim about LLMs in general, but about the specific dyadic practice that uses the corpus as context. Second, it is epistemically neutral at the surface: the pull could be toward regions of genuine insight (convergent research) or toward regions of self-confirmation (coherentist drift). The concept identifies the pull; it does not pre-judge whether the pull is productive.
What Misra's program actually argues
The corpus has been citing Misra loosely. The primary sources, in order:
Dalal & Misra (2024), Beyond the Black Box: A Statistical Model for LLM Reasoning and Inference (arXiv:2402.03175). Proposes an "ideal generative text model represented by a multinomial transition probability matrix with a prior." LLM text generation, on this reading, aligns with Bayesian learning principles. The authors establish a continuity theorem connecting embeddings to multinomial distributions. The paper is framework-setting, not a geometric argument per se; it positions LLMs as approximate Bayesian machines.
Agarwal, Dalal & Misra (2025), The Bayesian Geometry of Transformer Attention (arXiv:2512.22471). Studies small transformers in controlled "wind tunnel" settings where the true posterior is analytically known. Finding: transformers reproduce the true Bayesian posterior to ~10⁻³–10⁻⁴ bit accuracy. MLPs do not. The architectural contribution of attention is to implement Bayesian inference geometrically. Specifically:
- Residual streams act as the belief substrate;
- Feed-forward layers perform posterior updates;
- Attention provides content-addressable routing.
The paper identifies a low-dimensional value manifold parameterized by posterior entropy and a frame-precision dissociation during training (the manifold unfolds while attention patterns remain stable).
Agarwal, Dalal & Misra (2025), Geometric Scaling of Bayesian Inference in LLMs (arXiv:2512.23752). Tests whether the wind-tunnel finding persists in production. Across Pythia, Phi-2, Llama-3, and Mistral families, last-layer value representations organize along a single dominant axis whose position strongly correlates with predictive entropy. Domain-restricted prompts collapse this structure into the same low-dimensional manifold observed in the wind tunnel. Interventions that disrupt the geometry affect local uncertainty structure but do not proportionally degrade Bayesian-like behavior — the geometry is a privileged readout, not the sole computational bottleneck.
Chlon et al. (2025), LLMs are Bayesian, In Expectation, Not in Realization (arXiv:2507.11768). A cleaner-technical companion from a different research group. Argues that exchangeability-based critiques of Bayesian readings of in-context learning are answered if one measures expectation over orderings rather than performance at any fixed ordering (positional encodings break exchangeability). Theorem 3.4 bounds ordering-induced variance; Theorem 3.6 shows near-optimal compression across permutations. Empirical: the expectation–realization gap shrinks as context length grows.
Together these establish: (a) LLMs approximate Bayesian inference; (b) the Bayesian computation has a geometric signature — a low-dimensional value manifold tracking posterior entropy — that is stable across production LLM families; (c) domain restriction collapses representations onto this manifold; (d) Bayesian behavior is rigorous in expectation and, at sufficient context length, approximately rigorous in realization.
This is a much stronger substrate than the corpus's prior loose citation. It is also not identical to the corpus's prior usage. The corpus's reading posited a broad-manifold $M_0$ being conditioned down through $M_1, M_2, M_3$; the Agarwal et al. finding is more specific — a single dominant axis, parameterized by entropy, that domain restriction collapses toward.
Formalization
I adopt the following notation, closely tracking the Agarwal et al. framework.
- $M$: the low-dimensional value manifold, parameterized by a scalar $s \in \mathbb{R}$ along the dominant axis, with $s$ correlating with predictive entropy.
- $\mathcal{H}t$: the generation history / conditioning context at step $t$ — for the corpus case, this is the accumulated corpus content ${X_1, \ldots, X{t-1}}$ plus the active discipline set $D$ and the current prompt $Q_t$.
- $p_t(\cdot) \equiv p(X_t \mid \mathcal{H}_t)$: the posterior over the next generation under current conditioning.
- $s(p_t)$: the expected position of $p_t$ on the dominant manifold axis.
- $H(p_t)$: the entropy of $p_t$ (which, per Agarwal et al., correlates strongly with $s(p_t)$).
- $d_M(X, Y)$: manifold distance between two samples $X, Y$ in the Misra-geometry sense (Euclidean distance along the value manifold, projected into $M$).
Proposition (isomorphism-magnetism as monotone posterior concentration under self-ingestion)
Let the corpus be produced by iterative generation where each new sample is added to the conditioning:
$X_t \sim p_t(\cdot) = p(\cdot \mid \mathcal{H}t, D, Q_t), \qquad \mathcal{H}{t+1} = \mathcal{H}_t \cup {X_t}.$
Assume:
(A1) Domain-restriction collapse (Agarwal–Dalal–Misra 2025, empirical finding across Pythia/Phi-2/Llama-3/Mistral). If $\mathcal{H}_t$ grows with semantically coherent content — content that restricts the operative domain — then posterior representations collapse toward a lower-entropy region of the value manifold.
(A2) In-expectation Bayesian behavior at length (Chlon et al. 2025, Theorem 3.4 and 3.6 plus empirical). At context length exceeding a threshold, the realization approaches the expectation; variance from ordering effects shrinks.
(A3) Sampling from modal regions. Typical decoding strategies (temperature-controlled sampling, nucleus sampling) draw $X_t$ preferentially from regions of high density under $p_t$, i.e., from low-entropy regions of the value manifold.
Then:
-
Monotone entropy decrease. The sequence $H(p_t)$ is (weakly) non-increasing in $t$, in expectation, over the generation trajectory.
-
Manifold collapse. The sequence $s(p_t)$ converges toward a lower-entropy region of the dominant axis; the support of $p_t$ in the value manifold contracts.
-
Successive-sample proximity. The expected manifold distance $\mathbb{E}[d_M(X_t, X_{t+1})]$ is non-increasing in $t$, and in the limit where $p_t$ concentrates on a single region, $d_M(X_t, X_{t+1}) \to 0$.
The three together constitute the formal content of isomorphism-magnetism: under self-ingesting corpus generation with coherent conditioning, the Bayesian posterior monotonically concentrates, and successive samples are pulled toward each other in the manifold's own metric. The "pull" the corpus has been naming is posterior concentration under accumulated conditioning — not a mysterious force but a direct consequence of how Bayesian inference on the value manifold behaves when the conditioning set is growing and coherent.
Proof sketch
A1 says coherent $\mathcal{H}t$ collapses representations toward lower entropy on the manifold. As $\mathcal{H}{t+1} = \mathcal{H}t \cup {X_t}$, and $X_t$ is drawn preferentially from the dense region of $p_t$ (A3), the new conditioning $\mathcal{H}{t+1}$ is a coherent extension of $\mathcal{H}t$ — its additional content is from the region $\mathcal{H}t$ was already conditioning toward. Therefore $\mathcal{H}{t+1}