Skip to content
DuranteDurante
ALL SYSTEMSGet Access

27 weeks · 54 posts · Written while building

Field notes from a personal AI OS in flight

Every Tuesday, an evergreen essay on what I'm learning while shipping DuranteOS. Every Friday, a dispatch from the week. Roughly 108,000 words and counting — for builders who'd rather watch the foundation get poured than read the press release.

Subscribe · Tuesday essay

Around 3,800 builders read this weekly.

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

I do not have a Council orchestrator. I have something cruder — a habit of channelling two or three named specialists by hand whenever an architectural decision smells like it has more than one defensible answer. Sixteen weeks in, the habit has paid back enough times that I want to commit to its shape before the next time I'm tired and ship the wrong primitive. This is the design essay. Fowler, Kent Beck, and Uncle Bob disagree productively at the bottom — and that disagreement is the part the pattern is supposed to surface.

I am writing this on a Monday in the dead week between Christmas and New Year, sixteen weeks and roughly two hundred and fifteen commits into building DuranteOS. The Council orchestrator does not exist. The roster, the three-round protocol, the synthesis step, the cost-tracked invocation log — none of those have shipped.

What I have is the habit. Whenever a question lands that smells like it has more than one defensible answer, I open a fresh session and ask the agent to channel two or three named specialists from a small mental roster. I run them through an ad-hoc debate, in my head and in the chat window, and I write down whatever convergence or non-convergence emerges. The debate is not a formal protocol; it is closer to a thought experiment with citation.

It has paid back enough times in the last fourteen weeks that I want to commit to its shape now, before the next architectural decision arrives and I am tired enough to ship the wrong primitive.

There is a particular kind of question that produced the habit. "Should this be a microservice or a module?" "Should this aggregate be persisted as event-sourced or as CRUD?" "Is this duplication that should be refactored or duplication that should be left alone?" Every plausible answer carries different assumptions about team size, deployment cadence, change frequency, audit requirements, and operator skill. A single expert who answers without surfacing the assumptions sounds confident and is often wrong for my context. The habit was an early defense against my own confidence-shaped errors. The Council is what the habit becomes when it grows up.

What I want the Council to be, in one sentence

Two or three named specialists, three structured rounds (positions → cross-examination → convergence), one synthesized output that names the tradeoffs the original question hid.

The roster I want to ship with

The roster I have been pulling from in my head over the last fourteen weeks. Each seat would be a specialist agent that channels a real practitioner's framework, vocabulary, and quote bank, with explicit step-aside rules for cases where the framework does not apply. None of these are agents yet; they are personas I imitate when I run the debate informally.

SeatChannelsStrongest onSteps aside for
FowlerRefactoring catalog, PoEAA, microservices.ioCode smells, named refactorings, pattern selectionHard real-time, formal verification
KentBeckTDD, XP, Tidy First, Empirical Software DesignTest-list discipline, smallest experiment, coupling/cohesionDistributed sagas, hard real-time
UncleBobClean Code, SOLID, Three Laws of TDDBoundary identification, function size, plugin architectureFP idioms, ML, distributed coordination
SandiMetzPOODR, 99 Bottles, Four RulesObject-level design, refactoring pedagogy, namingLegacy without tests, strategic redesign
FeathersWorking Effectively with Legacy CodeSeams, characterization tests, dependency-breakingGreenfield TDD, throwaway scripts
CockburnHexagonal, Crystal, use cases at the right goal levelPorts & adapters, methodology weight, walking skeletonReal-time, FP, ML
EricEvansDDD Blue Book, Bounded Context, Ubiquitous LanguageStrategic design, aggregate boundaries, context mapsTrivial CRUD, throwaway scripts
GregYoungCQRS, Event Sourcing, KurrentAudit-grade ledgers, event versioning, projectionsPure CRUD, MVPs
Pragmatic (Hunt + Thomas)The 100 Tips, Knowledge Portfolio, Dreyfus modelCareer, broken windows, tracer bulletsFormal verification, hard real-time

Nine seats is what I think the v1 should be. I have flirted with adding more (Hickey for FP, Lampson for systems, Norvig for ML) but each seat I add pushes the cost of every Council invocation up linearly, and there is no point recruiting a seat I would invoke once a quarter. I would rather under-stock the roster and add seats deliberately as gaps appear.

The recruitment trigger I am committing to

The Council should not run on every request. The trigger I want to hardcode is explicit: invoke the Council when the operator names two or more specialists in a single message, OR when the request matches a small set of pre-classified Council-worthy patterns ("should this be X or Y," "review this architecture," "what would A and B say about this").

The reason for the explicit trigger is cost. A three-seat Council should cost three full specialist invocations plus a synthesis call. That is roughly four-to-six times the cost of a single-specialist response. Worth it for high-stakes decisions; wasteful for one-line refactors. The explicit trigger keeps the operator in control of the cost — me, in this case, with my own credits.

When to invoke the Council vs. when not

Don't invoke the Council
  • "Rename this variable" → single specialist or NATIVE mode
  • "Fix this typo" → just do it
  • "Add a console.log here" → just do it
  • "Tell me what this function does" → single specialist
  • "Format this file" → tooling, not Council
Do invoke the Council
  • "Should this be a microservice?" → Fowler + Cockburn
  • "Is this aggregate boundary right?" → Evans + Young
  • "How should we refactor this 800-line class?" → Bob + Sandi + Feathers
  • "Is duplication X cheaper than abstraction Y?" → Sandi + Fowler
  • "Which testing strategy fits this code?" → Beck + Feathers

The three-round protocol

The Council should run in three named rounds. Each round has a fixed shape and a fixed time budget. This is the part I am most confident about because it mirrors what I have been doing by hand for fourteen weeks.

Rendering diagram…

What I want each round to do

  1. Round 1 — Opening positions. Each seat answers the question independently, in their own voice, citing their own canonical framework. Around 250 words per seat. They do not see each other's answers yet.
  2. Round 2 — Cross-examination. Each seat reads the other seats' Round 1 answers and writes a focused critique: where they agree, where they disagree, and why the disagreement exists (which assumption is doing the work). Around 200 words per seat.
  3. Round 3 — Convergence attempt. Each seat writes a final position that integrates the Round 2 critiques. They are explicitly allowed to change their position. They are also explicitly allowed to stand firm and name the assumption that prevents them from converging. Around 150 words per seat.
  4. Synthesis. A separate orchestrator agent reads all three rounds and emits a single synthesis: agreed conclusions, named tradeoffs, the assumptions that drive each side of any unresolved disagreement, and a recommended decision for the operator's specific context.

The convergence step is the load-bearing one. From running this by hand, my rough estimate is that about 60% of debates converge — all seats arrive at the same answer by the third pass, with the disagreement having been a missing context cue rather than a real tradeoff. The other 40% do not converge, and the synthesis preserves the disagreement instead of papering over it. The non-convergence is informative; it is the system's way of saying "this is a real tradeoff your context will determine." I will know if those numbers hold once I have run the formal version for a quarter. Today they are guesses with calibration.

What Fowler, Beck, and Uncle Bob actually disagree about

The hardest test I can put this design through is to run the Council on the question of whether the Council is worth building.

The question I ran in my head before writing this post: "Is the Council pattern a useful primitive for AI agents, or is it ceremony that delays a single sharp answer?"

Three voices I have channelled most across this blog series. None of them are agents yet. The dialogue below is what I imagine each would say if I asked them at a dinner table.

I am aware that I am the one writing all three voices. The performance is honest about its limits. The point of running the formal version against real models is that the simulation in my head is biased toward convergence — I tend to imagine the specialists agreeing because I am the one constructing the position. The real version will produce more disagreement, more frequently, on cases I currently flatten. That is a feature, not a bug.

What I want the synthesis to add

The synthesis should not be a vote. It should not be "two specialists said yes, one said maybe, so the answer is yes." It should be a separate inference call that:

  1. Identifies what every seat agreed on (the robust conclusions)
  2. Identifies what every seat disagreed about (the live tradeoffs)
  3. For each disagreement, names the assumption that drives each side
  4. Recommends a position for this operator's context, not in the abstract

The fourth step is what makes the Council operationally useful instead of merely educational. The synthesis should be allowed to read the operator's CLAUDE.md, the eventual MemPalace context, and the project's eventual Sentinel conventions before writing the recommendation. The recommendation is informed by who is asking, not just by what is being asked.

This is the part of the design that connects to everything else I am sketching this quarter. The Council assumes Sentinel exists. Sentinel assumes MemPalace exists. MemPalace assumes the substrate exists. The whole stack is a graph of assumptions about what the agent already knows when it sits down to work — and the Council is the highest leverage point because it is where the most expensive decisions get filtered.

Three deliberate non-features

I am writing these down because they are the requests I will give myself in three weeks once the first version of the Council ships and is too austere to feel impressive.

One. The Council does not vote. Voting hides the assumptions that produced each vote. The synthesis must preserve the assumptions explicitly because they are the part the operator needs.

Two. The Council does not arbitrate disagreements. If three seats genuinely disagree, the synthesis names the tradeoff and recommends — but the operator decides. The Council is a thinking aid, not an authority. The moment it becomes an authority, operators stop reading the rounds and trust the synthesis blindly, and the value of the disagreement evaporates.

Three. The Council does not run silently. Every Council invocation should be logged with the question, the seats, all three rounds, and the synthesis. Operators must be able to replay any decision a year later and see exactly what was considered. This is the receipts discipline, applied to architectural decisions.

The cost discipline I am pre-committing to

A typical three-seat Council should cost on the order of three specialist invocations of ~2K tokens output each, plus a synthesis call of ~1K tokens. For a strategic decision affecting months of work, that is a small bargain. For a one-line refactor, it is wasted budget. The trigger is the cost discipline. If I find myself running the Council on cheap decisions, I have either over-engineered the trigger or under-engineered the alternative cheaper response paths.

The Council is one of those patterns that feels like overhead until the first time it saves an architectural decision that would have been catastrophic to get wrong. Then it pays for the next twenty cheap invocations. The trick is recognizing the question shape before committing to the wrong answer — which is the part I have been doing by hand, badly and slowly, for fourteen weeks.

I want the formal version by the end of the next quarter. I want to have run it against every architectural decision in DOS for the six months after that. I want to be able to show, a year from now, the receipts on every load-bearing call I made — three voices, the convergence or non-convergence preserved, the assumptions named, the operator-context recommendation written down.

That is a different kind of trust than "I made a call and it worked out."

I would rather have the receipts.

Was this page helpful?

The 27-week arc · A single body of work

Twenty-seven weeks. Two posts a week. Six months of writing while building.

Week

Tuesday evergreen

Friday dispatch