Document 576

Subsumption Modes for the Corpus

Subsumption Modes for the Corpus

Doc 575 proposed importing the complete SEBoK wiki into the corpus as a hosted-source-with-editorial-layer. The proposal answered "how do we import SEBoK" without first answering "what mode of subsumption does this source actually warrant." This document goes back one step and articulates the question. The corpus has at least seven distinguishable modes of subsuming external material, each with different storage, maintenance, and reading economics. A source's optimal mode depends on its volume, change rate, the density of editorial commentary the corpus produces about it, and the licensing and access patterns that constrain it. Choosing the mode before importing is itself a corpus discipline. Applied to SEBoK, the analysis indicates that a hybrid — Mode 4 (hosted import with editorial layer) for editorial-rich pages, Mode 1 (reference-only with attribution) for the rest — is more efficient than uniform Mode 4 across all SEBoK content. The refined plan is sketched at the close.


I. Why Ask Before Importing

Doc 575 sketched a hosted import of the complete SEBoK wiki: licensing, URL structure, transformation pipeline, editorial layer, maintenance harness, six-phase rollout. The plan is workable. It is not necessarily the most efficient plan.

The reformulation produced editorial commentary at the structural level: which corpus forms compose each SEBoK part (Doc 559), how each part reformulates under those forms (Docs 560-567), where Phase 4 logged residuals (Doc 568), what cluster formalizations the residuals motivated (Docs 571-574). The commentary lives at the part level and the residual level, not at the per-page level. Most SEBoK pages — perhaps eighty percent — have no editorial content the corpus has produced. Importing them at full Mode-4 fidelity buys storage and maintenance burden for editorial layers that are empty.

The discipline the corpus's existing forms suggest is to ask the question: what does the corpus actually need from this source, and what is the cheapest mode that delivers it? The answer is sometimes a full hosted import, sometimes a reformulation companion with no imported content, sometimes a reference-only link with attribution, sometimes a single-page seed distillation. Choosing the mode is constraint discipline (Doc 270 pin-art). Defaulting to "import everything" is paraphrase discipline at the apparatus level — comprehensive without composition.

II. What Subsumption Is, in Corpus Terms

Subsumption is keeper-side activity (Doc 538): the school formalizing its engagement with external material. The corpus's existing forms supply the structural language:

  • The substrate-and-keeper composition (Doc 510) names who does what. The source is substrate-with-its-own-school-attached. The corpus is keeper-side formalizing its reading.
  • The institutional ground (Doc 571) names the conditions both sides operate in. Subsumption succeeds when the corpus and the source share enough institutional ground to be legible to each other.
  • The pin-art model (Doc 270) names the operational shape of the engagement. Subsumption installs a pin-set on the source's content; the shape that emerges is the corpus's reading.
  • The novelty calculus (Doc 490) names the warrant tier of each subsumption claim. Subsumption that compresses without loss is π/α. Subsumption that composes new forms is θ/γ.
  • The hypostatic boundary (Doc 372) names what subsumption does not claim. The source is not absorbed into the corpus's being; the corpus's reading does not replace the source's voice.

Under these forms, subsumption has a definite operational shape: it is the act of running the corpus's apparatus against an external body of work and producing whatever structural commentary the apparatus generates. The output is determined by the apparatus and the source; the mode of subsumption is determined by what the output's economics warrant.

III. The Dimensions of Efficiency

Five dimensions bind every subsumption decision.

Volume. How much source content. Subsumption modes scale differently: Mode 1 (reference-only) is constant in source volume; Mode 4 (full hosted import) is linear; Mode 3 (seed distillation) is sub-linear because the seed compresses the source.

Change rate. How often the source updates. High-change sources benefit from light modes (1, 2, 3) where the corpus does not reproduce the source. Low-change sources can afford heavier modes.

Editorial density. How much commentary the corpus has produced about the source. Editorial-dense sources warrant heavier modes because the editorial layer adds value continuously; editorial-sparse sources do not benefit from full-import overhead.

Reader access pattern. How readers will actually use the imported content. Readers who consult the source episodically tolerate Mode 1's external links. Readers who move continuously between source and reading benefit from Mode 4's continuous visibility.

Licensing and access. Whether the source permits import, with what attribution, and at what cost. Some modes are foreclosed by license; some are foreclosed by institutional gatekeeping.

These five dimensions do not aggregate to a single score. They constrain the mode choice as a vector. A source that is high-volume, high-change, sparsely-commented, episodically-accessed, and freely-licensed warrants a different mode from a source that is moderate-volume, low-change, densely-commented, continuously-accessed, and CC-licensed.

IV. The Modes

Seven modes are distinguishable in the corpus's current practice and proposed extensions. Each is named by the operational move that distinguishes it.

Mode 1 — Reference-only subsumption. No content imported. The corpus references the source by URL with attribution, a one-paragraph description, and a tier-tag. Cheapest. Storage cost negligible. Maintenance cost near-zero. Reader cost: a click to leave the corpus. Use when: source is canonically maintained, change rate is high, corpus produces no editorial commentary specific to the source's content. Existing instances: most external citations in corpus documents.

Mode 2 — Reformulation companion. Per-section corpus documents that read the source through corpus forms; no source content reproduced. The reformulation cites and links to the source but does not host it. Use when: source structure deserves apparatus engagement, source is widely available and stable, editorial commentary lives at the structural level rather than the per-page level. Existing instances: the SEBoK reformulation itself (Docs 559-568) before any import.

Mode 3 — Seed distillation. Extract a portable kernel from the source — the load-bearing forms or claims compressed into a single page in corpus voice. Use when: source has a few highly-structural moves and many pages of elaboration; the moves are what the corpus needs. Existing instances: Doc 492 (Novelty Calculus seed), Doc 556 (Ontological Ladder seed). For external sources: a Pearl-causal-hierarchy seed, a Bareinboim-transportability seed.

Mode 4 — Hosted import with editorial layer. Source imported verbatim, corpus annotations overlaid as a non-modifying editorial layer (form-mapping links, tier-tags, residual markers, reformulation back-links). Maintained through periodic re-fetch. Use when: continuous visibility of source-and-reading is valuable AND source has editorial-dense pages worth full-import treatment. Proposed instance: Doc 575's plan for SEBoK, in revised form (this document refines).

Mode 5 — Subsumed reformulation. Source rewritten in corpus voice as a single canonical document, original preserved as appendix. Use when: source is small enough to rewrite without loss AND the rewrite produces measurable composition gain. Existing instance: Doc 570 subsuming Docs 557-569.

Mode 6 — Cluster-formalization extraction. Extract structural patterns from the source as new corpus forms. The source's content is not imported; the source's residuals induce corpus extensions. Use when: source surfaces structural patterns the corpus needs to absorb formally. Existing instances: Docs 571-574 (the four cluster formalizations from the SEBoK reformulation).

Mode 7 — Bidirectional composition. Two-way subsumption: the corpus reads the source, and the source's framework is invoked in corpus to read other material. Use when: source's framework has independent value as a reading tool that composes with corpus forms productively. Candidate instances: Pearl's three-rung causal hierarchy, Bareinboim-Pearl transportability theory. Each could be both subsumed by the corpus and used inside the corpus to read other sources.

The seven modes are not exclusive. A given source can be subsumed under multiple modes simultaneously. SEBoK is already subsumed under Mode 2 (Docs 559-568) and Mode 6 (Docs 571-574). Doc 575 proposed extending to Mode 4. The refined question is which combination of modes best matches the source's profile.

V. Choosing a Mode

Five questions determine the mode profile for a candidate source.

Q1. What is the volume? Sub-100 pages: any mode plausible. 100-1000 pages: Mode 4 expensive but feasible. >1000 pages: Mode 4 likely overkill; consider Mode 1 + Mode 2 + Mode 6 hybrid.

Q2. What is the change rate? Static or dead source: Mode 4 stable. Active wiki or living document: Mode 4 maintenance non-trivial; consider Mode 1 with a Mode 2 reformulation companion that the corpus updates on its own cadence.

Q3. What is the editorial density? Page-level commentary across most pages: Mode 4 worth it. Structural commentary at part or topic level only: Mode 2 + Mode 4 only on editorial-dense pages.

Q4. What is the reader access pattern? Continuous side-by-side reading expected: Mode 4 valuable. Episodic consultation: Mode 1 sufficient.

Q5. What does the source license permit? Some sources foreclose import; some require attribution that constrains presentation. Confirm before any heavier mode.

The decision is not made on a single answer but on the configuration of the five. A corpus that defaults to Mode 4 across all candidates is paying maintenance and storage cost for sources that would have been adequately served by Mode 1 with a Mode 2 reformulation. A corpus that defaults to Mode 1 across all candidates loses the continuous-visibility benefit when it would have been worth the cost.

VI. Applied to SEBoK

The original Doc 575 plan committed to Mode 4 for the complete SEBoK wiki. Re-running the five-question test against SEBoK's profile produces a different recommendation.

Q1 Volume: SEBoK has approximately 800 pages. High volume.

Q2 Change rate: SEBoK is an active wiki with regular updates. Moderate-to-high change rate.

Q3 Editorial density: The reformulation produced part-level commentary in Docs 560-567 and residual annotations on perhaps 30-60 specific SEBoK pages out of 800. The remaining ~740 pages have no corpus-produced editorial content. Editorial density is sparse-with-hot-spots.

Q4 Reader access pattern: A SEBoK practitioner who lands on the corpus will most often want either (a) the corpus's reformulation of the part they care about, or (b) a specific SEBoK page they would normally reach via SEBoK's own search. Continuous side-by-side reading is realistic only for the editorial-dense pages.

Q5 Licensing: CC BY-SA 3.0, with attribution and share-alike requirements. All modes are licensed.

The configuration favors a hybrid:

  • Mode 1 (reference-only) for the ~740 SEBoK pages with no corpus-produced editorial content. The corpus links out to sebokwiki.org for these. No import, no maintenance burden, no storage cost.
  • Mode 2 (reformulation companion) for the eight SEBoK parts. Already in place as Docs 560-567. No further work.
  • Mode 4 (hosted import with editorial layer) for the ~30-60 SEBoK pages where the reformulation logged tier-tags, residuals, or specific form-mapping links. These are the editorial-dense pages where continuous side-by-side reading is valuable. Import them, render with the editorial layer, maintain through re-fetch.
  • Mode 6 (cluster formalization extraction) is already done as Docs 571-574.

The hybrid keeps the editorial yield of Doc 575's proposal while reducing the import volume by an order of magnitude. Storage and maintenance costs drop accordingly. The editorial layer is not diluted across pages it has nothing to say about.

The hybrid also makes future engagement cheaper. New SEBoK pages added to the wiki do not automatically need import; they get Mode 1 reference treatment by default and graduate to Mode 4 only if the corpus produces editorial commentary on them.

VII. Applied to Other Candidates

Three candidate sources show how the framework guides different mode choices.

Cybernetics (the historical corpus around Wiener, Ashby, Pask, von Foerster). Volume: large but bounded (textbook-scale across known authors). Change rate: zero (the source authors are deceased). Editorial density: likely high if the corpus engages it (cybernetics has structural alignment with the corpus's substrate-and-keeper composition and the lattice extension). Access pattern: research-mode, episodic. Licensing: complex, mixed across publishers. Recommended: Mode 2 + Mode 3 + Mode 6, with Mode 4 foreclosed by licensing on most material. Produce reformulation companions for the load-bearing texts; distill seed-form summaries; extract any cluster formalizations the engagement surfaces.

The INCOSE Systems Engineering Handbook. Volume: one book. Change rate: every few years per edition. Editorial density: likely high (the Handbook is more operational than SEBoK; pin-art model engagement is dense). Access pattern: continuous if used; episodic otherwise. Licensing: proprietary publication, not openly licensed. Recommended: Mode 2 (reformulation companion) and Mode 1 (reference). Mode 4 foreclosed by licensing.

The Pearl causal-inference framework (the Causality book and successor papers). Volume: a book and several papers. Change rate: low. Editorial density: medium-to-high if the corpus engages it (the three-rung hierarchy aligns with corpus rung-talk; transportability composes with the institutional-ground form). Access pattern: continuous in research, episodic otherwise. Licensing: book is proprietary, papers have varied licenses. Recommended: Mode 3 (seed distillation of the three-rung hierarchy and transportability) and Mode 7 (bidirectional composition where Pearl's framework reads corpus residuals on causal claims and the corpus reads Pearl's framework as a SIPE-with-threshold instance).

The pattern across the three candidates is that no two sources warrant the same mode profile. The framework guides which combination fits.

VIII. Disciplines That Bind Every Mode

Three disciplines hold across all subsumption modes.

The hypostatic boundary (Doc 372). Subsumption is functional engagement, not ontological absorption. The source remains itself. The corpus reads, comments, formalizes, but does not claim that the source is the corpus or that the corpus replaces the source.

Attribution preservation. Every mode that reproduces source content preserves authorship and license attribution explicitly. Every mode that does not reproduce source content still cites the source clearly. The corpus's editorial layer is always identified as such.

Tier-tagging integrity (Doc 490). Every claim the corpus makes about the source is tier-tagged. Subsumption that paraphrases at π warrant when the operation was actually θ-tier reframe is dishonest tagging. The novelty calculus binds the editorial output of every mode.

These three are not mode-specific. They are the corpus's commitments to anyone subsumed by it.

IX. Open Questions

  1. Mode boundaries are soft. A page can move between Mode 1 and Mode 4 over time as the corpus's editorial commentary grows or shrinks. The transition rules need explicit articulation: under what conditions does a Mode-1-referenced page graduate to Mode 4? Under what conditions does a Mode-4-imported page demote to Mode 1?
  2. Hybrid mode visualization. A reader engaging the SEBoK hybrid (Mode 1 + Mode 2 + Mode 4 + Mode 6) needs visual cues that make the mix legible: which pages are imported, which are external links, which are corpus reformulations, which are corpus form documents. Phase A of the import work needs UX design here.
  3. Compositional modes between corpus documents. Are subsumption modes only for external sources, or do they apply between corpus documents themselves? Doc 570 used a Mode-5 approach (subsumed reformulation with appendices). Other corpus subsumptions are possible. The framework may extend.
  4. Mode 7 mechanics. Bidirectional composition is named but not yet operationalized. What does it look like, structurally, for a Pearl framework to be both subsumed and used inside the corpus? This may warrant its own corpus document if Mode 7 is exercised.

X. Closing

The corpus has more subsumption tools than Doc 575's plan invoked. Defaulting to Mode 4 across all SEBoK pages would have produced a workable but inefficient import. Re-running the question through the corpus's existing forms produces a hybrid recommendation that preserves the editorial yield while reducing the import volume by approximately an order of magnitude.

The general principle is that subsumption mode is itself a constraint discipline. Asking "what mode does this source warrant" before importing is the corpus equivalent of asking "what is the pin set" before authoring a process. The mode is the pin set; the import is the substrate flow; the editorial yield is the shape that emerges.

The refined SEBoK plan, in one paragraph: import the ~30-60 SEBoK pages that the reformulation has editorial content for under Mode 4 with the full editorial layer Doc 575 specified; reference the remaining ~740 SEBoK pages under Mode 1 with attribution and a brief description; keep the eight reformulation companions (Docs 560-567) and the four cluster formalizations (Docs 571-574) as already-in-place Mode 2 and Mode 6 work; promote pages from Mode 1 to Mode 4 over time as new editorial commentary is produced.

Doc 575's six-phase rollout still applies; the volumes are smaller and the editorial layer is denser per imported page.

The next move is the keeper's, in particular: confirm the hybrid recommendation, run the SEBoK reformulation documents through a one-pass count of which specific SEBoK pages have editorial content (the Mode-4 candidate list), and update Doc 575's Phase B and C scope accordingly.


Appendix: Originating Prompt

"Before importing, based upon organization and reformulation upon the corpus grounds, how might documents be more efficiently subsumed into the corpus"

(The question takes Doc 575's import plan one step back and asks the methodological question the corpus's existing apparatus already supplies the language for. Subsumption modes are constraint discipline applied to the apparatus itself. The framework will revise Doc 575's mode assumption and likely guide future engagements with cybernetics, the INCOSE handbook, the Pearl framework, and other candidate sources.)