27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Dec 30, 2025

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

I do not have a Council orchestrator. I have something cruder — a habit of channelling two or three named specialists by hand whenever an architectural decision smells like it has more than one defensible answer. Sixteen weeks in, the habit has paid back enough times that I want to commit to its shape before the next time I'm tired and ship the wrong primitive. This is the design essay. Fowler, Kent Beck, and Uncle Bob disagree productively at the bottom — and that disagreement is the part the pattern is supposed to surface.

I am writing this on a Monday in the dead week between Christmas and New Year, sixteen weeks and roughly two hundred and fifteen commits into building DuranteOS. The Council orchestrator does not exist. The roster, the three-round protocol, the synthesis step, the cost-tracked invocation log — none of those have shipped.

What I have is the habit. Whenever a question lands that smells like it has more than one defensible answer, I open a fresh session and ask the agent to channel two or three named specialists from a small mental roster. I run them through an ad-hoc debate, in my head and in the chat window, and I write down whatever convergence or non-convergence emerges. The debate is not a formal protocol; it is closer to a thought experiment with citation.

It has paid back enough times in the last fourteen weeks that I want to commit to its shape now, before the next architectural decision arrives and I am tired enough to ship the wrong primitive.

There is a particular kind of question that produced the habit. "Should this be a microservice or a module?" "Should this aggregate be persisted as event-sourced or as CRUD?" "Is this duplication that should be refactored or duplication that should be left alone?" Every plausible answer carries different assumptions about team size, deployment cadence, change frequency, audit requirements, and operator skill. A single expert who answers without surfacing the assumptions sounds confident and is often wrong for my context. The habit was an early defense against my own confidence-shaped errors. The Council is what the habit becomes when it grows up.

What I want the Council to be, in one sentence

Two or three named specialists, three structured rounds (positions → cross-examination → convergence), one synthesized output that names the tradeoffs the original question hid.

The roster I want to ship with

The roster I have been pulling from in my head over the last fourteen weeks. Each seat would be a specialist agent that channels a real practitioner's framework, vocabulary, and quote bank, with explicit step-aside rules for cases where the framework does not apply. None of these are agents yet; they are personas I imitate when I run the debate informally.

Seat	Channels	Strongest on	Steps aside for
Fowler	Refactoring catalog, PoEAA, microservices.io	Code smells, named refactorings, pattern selection	Hard real-time, formal verification
KentBeck	TDD, XP, Tidy First, Empirical Software Design	Test-list discipline, smallest experiment, coupling/cohesion	Distributed sagas, hard real-time
UncleBob	Clean Code, SOLID, Three Laws of TDD	Boundary identification, function size, plugin architecture	FP idioms, ML, distributed coordination
SandiMetz	POODR, 99 Bottles, Four Rules	Object-level design, refactoring pedagogy, naming	Legacy without tests, strategic redesign
Feathers	Working Effectively with Legacy Code	Seams, characterization tests, dependency-breaking	Greenfield TDD, throwaway scripts
Cockburn	Hexagonal, Crystal, use cases at the right goal level	Ports & adapters, methodology weight, walking skeleton	Real-time, FP, ML
EricEvans	DDD Blue Book, Bounded Context, Ubiquitous Language	Strategic design, aggregate boundaries, context maps	Trivial CRUD, throwaway scripts
GregYoung	CQRS, Event Sourcing, Kurrent	Audit-grade ledgers, event versioning, projections	Pure CRUD, MVPs
Pragmatic (Hunt + Thomas)	The 100 Tips, Knowledge Portfolio, Dreyfus model	Career, broken windows, tracer bullets	Formal verification, hard real-time

Nine seats is what I think the v1 should be. I have flirted with adding more (Hickey for FP, Lampson for systems, Norvig for ML) but each seat I add pushes the cost of every Council invocation up linearly, and there is no point recruiting a seat I would invoke once a quarter. I would rather under-stock the roster and add seats deliberately as gaps appear.

The recruitment trigger I am committing to

The Council should not run on every request. The trigger I want to hardcode is explicit: invoke the Council when the operator names two or more specialists in a single message, OR when the request matches a small set of pre-classified Council-worthy patterns ("should this be X or Y," "review this architecture," "what would A and B say about this").

The reason for the explicit trigger is cost. A three-seat Council should cost three full specialist invocations plus a synthesis call. That is roughly four-to-six times the cost of a single-specialist response. Worth it for high-stakes decisions; wasteful for one-line refactors. The explicit trigger keeps the operator in control of the cost — me, in this case, with my own credits.

When to invoke the Council vs. when not

Don't invoke the Council

"Rename this variable" → single specialist or NATIVE mode
"Fix this typo" → just do it
"Add a console.log here" → just do it
"Tell me what this function does" → single specialist
"Format this file" → tooling, not Council

Do invoke the Council

"Should this be a microservice?" → Fowler + Cockburn
"Is this aggregate boundary right?" → Evans + Young
"How should we refactor this 800-line class?" → Bob + Sandi + Feathers
"Is duplication X cheaper than abstraction Y?" → Sandi + Fowler
"Which testing strategy fits this code?" → Beck + Feathers

The three-round protocol

The Council should run in three named rounds. Each round has a fixed shape and a fixed time budget. This is the part I am most confident about because it mirrors what I have been doing by hand for fourteen weeks.

Rendering diagram…

What I want each round to do

Round 1 — Opening positions. Each seat answers the question independently, in their own voice, citing their own canonical framework. Around 250 words per seat. They do not see each other's answers yet.
Round 2 — Cross-examination. Each seat reads the other seats' Round 1 answers and writes a focused critique: where they agree, where they disagree, and why the disagreement exists (which assumption is doing the work). Around 200 words per seat.
Round 3 — Convergence attempt. Each seat writes a final position that integrates the Round 2 critiques. They are explicitly allowed to change their position. They are also explicitly allowed to stand firm and name the assumption that prevents them from converging. Around 150 words per seat.
Synthesis. A separate orchestrator agent reads all three rounds and emits a single synthesis: agreed conclusions, named tradeoffs, the assumptions that drive each side of any unresolved disagreement, and a recommended decision for the operator's specific context.

The convergence step is the load-bearing one. From running this by hand, my rough estimate is that about 60% of debates converge — all seats arrive at the same answer by the third pass, with the disagreement having been a missing context cue rather than a real tradeoff. The other 40% do not converge, and the synthesis preserves the disagreement instead of papering over it. The non-convergence is informative; it is the system's way of saying "this is a real tradeoff your context will determine." I will know if those numbers hold once I have run the formal version for a quarter. Today they are guesses with calibration.

What Fowler, Beck, and Uncle Bob actually disagree about

The hardest test I can put this design through is to run the Council on the question of whether the Council is worth building.

The question I ran in my head before writing this post: "Is the Council pattern a useful primitive for AI agents, or is it ceremony that delays a single sharp answer?"

Three voices I have channelled most across this blog series. None of them are agents yet. The dialogue below is what I imagine each would say if I asked them at a dinner table.

I am aware that I am the one writing all three voices. The performance is honest about its limits. The point of running the formal version against real models is that the simulation in my head is biased toward convergence — I tend to imagine the specialists agreeing because I am the one constructing the position. The real version will produce more disagreement, more frequently, on cases I currently flatten. That is a feature, not a bug.

What I want the synthesis to add

The synthesis should not be a vote. It should not be "two specialists said yes, one said maybe, so the answer is yes." It should be a separate inference call that:

Identifies what every seat agreed on (the robust conclusions)
Identifies what every seat disagreed about (the live tradeoffs)
For each disagreement, names the assumption that drives each side
Recommends a position for this operator's context, not in the abstract

The fourth step is what makes the Council operationally useful instead of merely educational. The synthesis should be allowed to read the operator's CLAUDE.md, the eventual MemPalace context, and the project's eventual Sentinel conventions before writing the recommendation. The recommendation is informed by who is asking, not just by what is being asked.

This is the part of the design that connects to everything else I am sketching this quarter. The Council assumes Sentinel exists. Sentinel assumes MemPalace exists. MemPalace assumes the substrate exists. The whole stack is a graph of assumptions about what the agent already knows when it sits down to work — and the Council is the highest leverage point because it is where the most expensive decisions get filtered.

Three deliberate non-features

I am writing these down because they are the requests I will give myself in three weeks once the first version of the Council ships and is too austere to feel impressive.

One. The Council does not vote. Voting hides the assumptions that produced each vote. The synthesis must preserve the assumptions explicitly because they are the part the operator needs.

Two. The Council does not arbitrate disagreements. If three seats genuinely disagree, the synthesis names the tradeoff and recommends — but the operator decides. The Council is a thinking aid, not an authority. The moment it becomes an authority, operators stop reading the rounds and trust the synthesis blindly, and the value of the disagreement evaporates.

Three. The Council does not run silently. Every Council invocation should be logged with the question, the seats, all three rounds, and the synthesis. Operators must be able to replay any decision a year later and see exactly what was considered. This is the receipts discipline, applied to architectural decisions.

The cost discipline I am pre-committing to

A typical three-seat Council should cost on the order of three specialist invocations of ~2K tokens output each, plus a synthesis call of ~1K tokens. For a strategic decision affecting months of work, that is a small bargain. For a one-line refactor, it is wasted budget. The trigger is the cost discipline. If I find myself running the Council on cheap decisions, I have either over-engineered the trigger or under-engineered the alternative cheaper response paths.

The Algorithm

The single-perspective discipline the Council should augment.

A worked Council example

The kind of refactoring decision worth three seats and a synthesis.

Skills, Packs, Hooks

How the Council fits the broader capability model.

Cockburn at the table

Hexagonal lens, channelled through the Council seat.

The Council is one of those patterns that feels like overhead until the first time it saves an architectural decision that would have been catastrophic to get wrong. Then it pays for the next twenty cheap invocations. The trick is recognizing the question shape before committing to the wrong answer — which is the part I have been doing by hand, badly and slowly, for fourteen weeks.

I want the formal version by the end of the next quarter. I want to have run it against every architectural decision in DOS for the six months after that. I want to be able to show, a year from now, the receipts on every load-bearing call I made — three voices, the convergence or non-convergence preserved, the assumptions named, the operator-context recommendation written down.

That is a different kind of trust than "I made a call and it worked out."

I would rather have the receipts.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat