27 weeks · 54 posts · Written while building

Field notes from a personal AI OS in flight

Every Tuesday, an evergreen essay on what I'm learning while shipping DuranteOS. Every Friday, a dispatch from the week. Roughly 108,000 words and counting — for builders who'd rather watch the foundation get poured than read the press release.

Start here

If you're thinking about writing while building

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

If you're considering the founding cohort

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

If you want the architectural spine

The Decomposition Discipline I Am Trying to Codify Inside DOS

Subscribe · Tuesday essay

Around 3,800 builders read this weekly.

Nov 18, 2025

The Decomposition Discipline I Am Trying to Codify Inside DOS

Vague requests produce vague work. Two months into building DOS I have started enforcing a discipline I am calling, for now, 'the Algorithm' — every request decomposed into a numbered list of falsifiable criteria before any tool call. This is the design essay for that discipline, written ten weeks in, before the practice is formal enough to call a framework.

The single most expensive failure mode I have already noticed in agentic software is also the one that looks most like success.

The agent receives a request. It produces a response that appears responsive. It emits code, prose, or a plan. The operator scans the output, sees nothing obviously wrong, and approves. Three days later the operator discovers the output addressed only one of the four things the request actually asked for. The other three were quietly dropped during interpretation.

This is not a model-quality problem. Frontier models are not stupid. It is a discipline problem. In the absence of a forcing function, agents (like humans) optimize for the legible part of the request and silently abandon the rest.

I have been working on a discipline I want to make load-bearing inside DOS. I am calling it, for now, the Algorithm — a deliberately grand name for what is currently a rough convention, with the bet that codifying it early is cheaper than retrofitting it later. As of today, ten weeks into the build, the Algorithm exists as a few pages of notes in the DOS repo and a half-implemented response-format hook. Not a framework. Not a versioned spec. A discipline I am trying to enforce on myself first.

This post is the design essay for that discipline.

The Algorithm in one sentence

Every multi-step request gets decomposed into a numbered list of Ideal State Criteria (ISCs) before any tool call. The work is verified against the list. Done means every ISC verified, not "looks done."

The Beck angle: the test list is the design

Kent Beck, in his 2023 Canon TDD essay, makes a claim I have come to think is the most important sentence in software methodology:

Write a list of test cases you know you'll need.

That is the first step. Not write a test. Not write the simplest test that could possibly fail. Write the list. The list is the unit of design work. Once the list exists, the implementation is mostly mechanical — pick the next test, write it, make it pass, refactor. The intelligence is in producing the list correctly.

The Algorithm I want to enforce in DOS is Canon TDD generalized from "tests for one feature" to "criteria for any request." The ISC list is the test list. Every ISC is a falsifiable claim about what "done" looks like for this specific request. The work is mechanical after the list. The intelligence is in producing the list.

The Uncle Bob angle: the Three Laws, slightly modified

Robert Martin's Three Laws of TDD constrain when you may write production code. The Algorithm I am sketching imposes analogous constraints on when the agent may take any action.

Bob's Three Laws of TDD (paraphrased)	Algorithm equivalent (planned)
You may not write production code until you have written a failing unit test	You may not call any tool until you have written the ISC list
You may not write more of a unit test than is sufficient to fail	You may not write more ISCs than the request actually demands
You may not write more production code than is sufficient to pass the failing test	You may not take more action than is sufficient to verify the next unverified ISC

The point of all three is the same: discipline against premature acceleration. The agent's instinct (and the human engineer's instinct) is to get going. The Algorithm's instinct is to first specify what done means, then go.

A request without ISCs vs. with ISCs

Without the discipline

Request: "Update the README, fix the broken links, and remove Chris from the contributors list."

Agent response: Updates the README intro paragraph to be more enthusiastic. Reports done.

Operator three days later: "You didn't fix the links and Chris is still in CONTRIBUTORS.md."

The agent latched onto the first verb (update), produced a plausible-looking deliverable, and silently dropped the other two parts of the request.

With ISC decomposition

Request: Same.

Agent response, before any tool call:

ISC-1: README intro reflects the latest product direction
ISC-2: All links in README resolve to a 200 status
ISC-3: CONTRIBUTORS.md no longer contains "Chris"
ISC-4 (anti-criterion): No mention of Chris remains in any file under the repo root

Then: works on each ISC in order. Verifies each. Reports 4/4 verified or names the specific ISC that failed and why.

The discipline is not exotic. It is what every senior engineer does intuitively when given a high-stakes request. I want to make it mandatory inside DOS. Whether that survives daily use over months is the experiment I am about to run.

The seven phases I want the Algorithm to enforce

The version I am drafting has seven named phases. Each phase has a fixed output shape. The agent should not be allowed to skip a phase. The phases should be visible in the response so the operator can audit the decomposition.

Rendering diagram…

The loop between VERIFY and PLAN is the load-bearing part. When an ISC fails verification, I want the Algorithm to forbid declaring done with a caveat. Go back to PLAN, revise. The only legitimate way out of the loop is full ISC pass.

What each phase will do (planned)

The fixed shape I want every algorithmic response in DOS to follow.

OBSERVE — read the request literally. Capture the request verbatim. Note negatives (anti-criteria), constraints, deadlines, and what is not being asked. Allowed to read files but not write.
THINK — interpret what success would look like. Surface ambiguities. List the criteria a reasonable operator would use to judge the work done. This is the ISC list. Numbered, atomic, falsifiable.
PLAN — for each ISC, decide what action sequence will satisfy it. Identify dependencies between ISCs. Pick an ordering. The plan is a sequence of intended tool calls, not a literary description.
ACT — execute the plan. One tool call at a time. After each call, check whether any ISC just became verifiable.
VERIFY — for each ISC, run the verification it requires. Tests, file reads, screenshot inspection, whatever the ISC's "done" definition demands. Mark each ISC PASS or FAIL with evidence.
REFLECT — if all ISCs PASS, capture any learnings about how the decomposition went. What ISCs did you miss the first time? What did you over-specify? Loops back into how the next decomposition is done.
DONE — emit the structured "all N/N verified" line. Stop.

I have implemented OBSERVE through ACT in rough form already. VERIFY needs the evidence-citation rule I describe below. REFLECT does not exist yet. DONE is just a string.

The ISC schema I am converging on

ISCs cannot be free-form prose. A vague ISC produces vague verification. The schema I am drafting:

Name	Type	Required	Default	Description
id	string	yes	ISC-N	Sequential identifier inside the request.
claim	string	yes	—	A falsifiable statement about what 'done' looks like. Past-tense or descriptive, not aspirational.
verification	string	yes	—	The exact action that proves PASS or FAIL. 'I read the file and saw X.' Not 'looks right.'
polarity	positive \| negative	yes	positive	Negative ISCs (anti-criteria) verify that something does NOT happen. As important as positive.
depends_on	ISC-N[]	no	[]	Other ISCs that must be satisfied first.
status	unverified \| pass \| fail	yes	unverified	Default unverified. Only flips to PASS with explicit evidence.

The negative-polarity field is the one I expect operators (read: me) to be most surprised by. Already in the first 10 weeks of DOS use, I have noticed that maybe 20% of subjective failures are not "did the wrong thing" but "did the right thing AND something extra that broke an unrelated assumption." Anti-criteria — "no other file is modified," "no new dependencies are added" — should catch this class.

Three things I expect to break when I deploy this

I am writing these down as predictions because writing them down is the cheapest way to make sure I do not later pretend I anticipated them.

If any of the three predictions land cleanly, I will know the discipline-collapsing-under-cognitive-load failure mode is real and the mechanical fixes I am planning are the right shape. If a prediction misses, I will know my model of agent failure is wrong somewhere — equally useful information.

What this discipline will cost

Honestly: I expect the Algorithm to be slower than just-doing-the-thing on small requests. A one-line "rename this variable" request running through OBSERVE-THINK-PLAN-ACT-VERIFY will probably take 4 seconds where a raw LLM would take 1. For trivial work that 4x slowdown is real and unwelcome.

I expect the cost to be justified at the request size where it matters. For any request involving more than three criteria, the Algorithm overhead should be less than the cost of a missed criterion discovered three days later. The crossover I am betting on is around request-size 3-4. Below that, a lighter mode (which I am tentatively calling NATIVE) handles the work. Above it, ALGORITHM mode should be mandatory.

That mode classifier does not exist yet. Building it is one of the next four weeks of work.

The asymmetric cost that justifies the bias

Under-specifying a request is much more expensive than over-specifying it. A missed criterion costs days. An extra ISC costs seconds. The classifier should default to the heavier mode under doubt. That is the only setting that survives the first regression.

What I think this implies if you are building agents

Three implications, in increasing order of how much pushback I expect.

One. Decomposition is the load-bearing skill. Models are good enough at execution. Models are still meaningfully variable at decomposition. If you want a reliable agent, invest your prompt budget in the decomposition phase, not the execution phase.

Two. Verification needs evidence, not assertion. The single most common failure of vibe-coded agents is asserting verification without performing it. Make the evidence requirement mechanical. "Show me the test result" is a different prompt from "did it work?"

Three. REFLECT is where the system learns. The Algorithm without REFLECT is a static recipe. The Algorithm with REFLECT becomes a system that gets better at decomposition over time — if I can make REFLECT mandatory. The conditional is doing real work in that sentence.

The thesis the discipline serves

Why the agent is the product.

The substrate beneath the discipline

The memory layer the Algorithm will write into.

The Algorithm is not the most exciting part of what I am building. It is the part I think will make the exciting parts trustworthy. Without it, the agent is plausible. With it, the agent is verifiable.

I will take verifiable over plausible every day — once the discipline is real enough to call by that name. Today it is rough convention. Future posts will tell you what survived contact with reality.

— Lucas

Was this page helpful?

The 27-week arc · A single body of work

Twenty-seven weeks. Two posts a week. Six months of writing while building.

Week

Tuesday evergreen

Friday dispatch

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat