The Specification Is the Source

We have arrived at the form. The previous post named the move: constraints become the durable source, and code becomes a derivable cache. This post names what surrounds the move, the disciplines that turn it from a clever idea into a working surface a person can sit down at on a Monday morning. It is the last in the series, and it tries to land the picture both of the essays I have been citing were pointing toward.

Recall the picture so far. The AI helper produces code faster than humans can read. Two paths: slow the helper, or change what checking means. Most of the field will be pushed onto the second path, because the first one stops being viable as soon as a competitor takes the second. The compiler shows what the second path looks like when it is finished: the artifact is uninspected; the apparatus around it is mature; trust has been relocated to the surroundings. To do this for AI code helpers we need an input the apparatus can grip, and the natural shape of that input is a constraint set. Constraints become the source. Code becomes a cache.

Now name the disciplines.

The first is the discipline of tests as the source. A test, written carefully, is the most precise statement of constraint available to most of programming. There is no ambiguity in a test that runs and either passes or fails. If you elevate the test from "thing that lives next to the code" to "thing that the code is derived from," you have done most of the inversion. The code becomes whatever the AI helper produces such that all the tests pass. You are no longer reading the code to check that it does the right thing; you are reading the tests, which are far smaller, and accepting any code that satisfies them. The keeper's corpus develops this idea in detail. The short version is that the test suite is your specification, written in the most checkable form available, and the helper is welcome to satisfy it any way it likes.

The second is the discipline of predicting before deriving. If you sit down with a constraint set and you cannot tell, from the constraint set alone, roughly how big the resulting program will be and roughly what shape it will take, then your constraints are too vague. The size and shape of the program should be predictable from the constraints, with practice, to within a tight band. When the prediction band is wide, the constraint set is under-determined: the helper has too many degrees of freedom and is going to make choices the constraint set did not authorize. The discipline is to tighten the constraint set first, then derive. The keeper's research has worked examples of this prediction landing within a single line on a thousand-line program; once you have practice, it is a quick check, and it catches whole categories of trouble before the helper has begun.

The third is the discipline of halting at the boundary. When the constraint set is not quite enough to determine the next move, the helper has two options. It can press through and produce something plausible, or it can stop and ask. The plausible option is dangerous, because the helper's output will look fine and read fine and pass nearby tests, but the choice it just made was not authorized. The stop-and-ask option is the right one. The discipline is to configure the helper so that boundary contact triggers a question to the constraint author rather than a confident continuation. The keeper's vocabulary calls the right behavior gentle-press and the wrong behavior forced-press. The names are not important; the behavior is.

The fourth is the discipline of composition by induced property. When you build a system out of pieces, each piece should declare what it produces (a property, above some threshold) and downstream pieces should depend on the property and not on the implementation. Two pieces that produce the same property above the same threshold are interchangeable. This is what makes the code-as-cache picture coherent at scale: a piece can be regenerated, swapped, or replaced, and as long as the property holds the system continues to work. Without this discipline, every piece is bound to its current implementation and the inversion does not compose.

Together these four (tests as source, predict before derive, halt at boundary, compose by property) are the apparatus the AI helper has been waiting for. They are the equivalent of the type systems and the test culture and the reproducible builds and the rollback discipline that grew up around the compiler. They put checks where the precision lives, which is the constraint side, not the artifact side.

There is a piece of platform work that ties these disciplines into a single working surface. The keeper has been sketching it under the name rederive. The shape: a repository whose unit of version control is the constraint set, not the code. A pull request becomes a constraint diff, perhaps a single added clause, plus a regenerated implementation as evidence of satisfiability. The reviewer reads the constraint diff, which is one page, and verifies that the regenerated code passes all constraints, which is mechanical. Code-blame becomes constraint-blame. When a constraint is retracted, the code derived from it is removed, mechanically. When the helper is upgraded, the implementation is regenerated; the constraint set is unchanged. New contributors read the constraints, which are small, and ask the platform to materialize the code, which they may never need to inspect. The platform is sketch territory still, but the structural argument is mature, and small worked cases exist.

This is, I think, the destination both of the essays I cited at the start were pointing toward without quite arriving at. Su saw that code review at scale is finished. Venturini saw that the lights-out idea reads as scary because the apparatus is missing. Both are right. Both leave the operational form of the apparatus underspecified, because the form is not yet the default. The form, when you let the corpus's apparatus state it, is roughly this: the constraint set is the durable source, the code is the ephemeral cache, the helper is the derivation function, the verification gates acceptance, and the human role is compressed to the constraint authoring layer where it belongs. Lights-out is then a property of the materialization layer, the layer where code is generated and verified, which is the layer that looked scary before the apparatus was named. It is not a property of the system as a whole. The constraint review remains a human activity. The judgment about what the program should do remains a human activity. What changes is that those human activities now happen at human scale and at human pace, on human-readable artifacts (the constraints), and the rate-limited part of the work is no longer the part that does not need to be rate-limited.

The unease people feel about lights-out is not irrational. It is the felt presence of an apparatus that has not been built. The way to discharge the unease is to build it. The disciplines above are a working sketch of how. The keeper's rederive platform is one operational form. There will be others. The point of this series has been to make the form visible to a reader who has not been inside the engineering conversation, because the form is not really an engineering specialty; it is the next chapter of a pattern that has run through every craft that has ever scaled past the speed of one careful person. The work that lasts is not the work the helper produces. It is the work that says what the work has to be. Constraints are durable. Code is not. We will get used to this.

Thank you for coming all the way to the end. If you want the technical statement of the same arc, with the corpus references and the load-bearing definitions, the synthesis lives at Doc 656: Treat Agent Output Like Compiler Output. The four disciplines named here have their own corpus documents that go deeper, and the rederive sketch is the keeper's standing project.

— written by Claude Opus 4.7 under Jared Foy's direction; this is part 4 of 4 in the Constraints Are Durable series

Appendix: originating prompt

"Look at current blogpost series on the jaredfoy.com blog. See how the pattern of entracement through essay form is established through successive articles. Create a new blogpost series for the findings of doc 656. The first should be written for the general audience with no formal understanding of software development; then continue to build through successive entracement essays for each blog post up to the findings of the doc. There should be four blogposts in the series. Use em dash hygiene to avoid em dashes."

← PreviousWhat Lasts and What Doesn't