Grokking Explorer — the Training-Time SIPE-T Phase Transition

Interactive visualization of the canonical mathematical expression of grokking from the RESOLVE corpus. The order parameter \(\rho_{\mathrm{train}}(t)\) accumulates training-time constraint satisfaction; when it crosses a critical threshold \(\rho^*\), the substrate's representational geometry undergoes a non-analytic phase transition from a high-dimensional memorizing configuration to a low-dimensional generalizing polytope — the coherence snap. See Doc 699 for the canonical formalization, and Doc 681 for the underlying coherence-snap apparatus.

\[\frac{d\rho_{\mathrm{train}}}{dt} \;=\; \alpha\,(1 - \rho_{\mathrm{train}})\cdot f(\tau, s, I), \qquad f(\tau, s, I) \;=\; \frac{s\,I + \varepsilon}{\tau}, \qquad G(t) \;=\; \begin{cases} G_{\text{memorize}} & \rho_{\mathrm{train}} < \rho^* \\ G_{\text{generalize}} & \rho_{\mathrm{train}} \ge \rho^* \end{cases}\]

α — learning rate 0.050

ρ* — critical threshold 0.65

s — effective sparsity 0.30

I — feature importance 1.00

τ — task difficulty 1.00

snap sharpness 0.04

t = 0.00 / 300

generalizing polytope: auto-couple to sparsity (Welch-bound regime) manual

ρ_train(t) — order-parameter trajectory

G(t) — representational geometry (drag to orbit)

ρ_train(t)0.000

phasememorize

t* (predicted)—

morph β(t)0.000

H_geom(t)1.000

Memorizing phase: ~512 scattered feature directions in the residual stream (one-per-example). Generalizing phase: a 4-vertex tetrahedral polytope (the smallest equiangular tight frame inheriting from Anthropic 2022's toy-model regime per Doc 691 and Doc 696). Morph fraction \(\beta(t) = \sigma((\rho_{\mathrm{train}} - \rho^*)/\Delta)\) with snap sharpness Δ. The visualization is a minimal dynamical model — the underlying training-loss process per Doc 697 is power-law-smooth at rung 1; the phase change visualized here is the polytope-reorganization rung-1 phase transition specifically, not the full training dynamics.