I am writing this on a Monday in the dead week between Christmas and New Year, sixteen weeks and roughly two hundred and fifteen commits into building DuranteOS. The Council orchestrator does not exist. The roster, the three-round protocol, the synthesis step, the cost-tracked invocation log — none of those have shipped.
What I have is the habit. Whenever a question lands that smells like it has more than one defensible answer, I open a fresh session and ask the agent to channel two or three named specialists from a small mental roster. I run them through an ad-hoc debate, in my head and in the chat window, and I write down whatever convergence or non-convergence emerges. The debate is not a formal protocol; it is closer to a thought experiment with citation.
It has paid back enough times in the last fourteen weeks that I want to commit to its shape now, before the next architectural decision arrives and I am tired enough to ship the wrong primitive.
There is a particular kind of question that produced the habit. "Should this be a microservice or a module?" "Should this aggregate be persisted as event-sourced or as CRUD?" "Is this duplication that should be refactored or duplication that should be left alone?" Every plausible answer carries different assumptions about team size, deployment cadence, change frequency, audit requirements, and operator skill. A single expert who answers without surfacing the assumptions sounds confident and is often wrong for my context. The habit was an early defense against my own confidence-shaped errors. The Council is what the habit becomes when it grows up.
What I want the Council to be, in one sentence
Two or three named specialists, three structured rounds (positions → cross-examination → convergence), one synthesized output that names the tradeoffs the original question hid.
The roster I want to ship with
The roster I have been pulling from in my head over the last fourteen weeks. Each seat would be a specialist agent that channels a real practitioner's framework, vocabulary, and quote bank, with explicit step-aside rules for cases where the framework does not apply. None of these are agents yet; they are personas I imitate when I run the debate informally.
| Seat | Channels | Strongest on | Steps aside for |
|---|---|---|---|
| Fowler | Refactoring catalog, PoEAA, microservices.io | Code smells, named refactorings, pattern selection | Hard real-time, formal verification |
| KentBeck | TDD, XP, Tidy First, Empirical Software Design | Test-list discipline, smallest experiment, coupling/cohesion | Distributed sagas, hard real-time |
| UncleBob | Clean Code, SOLID, Three Laws of TDD | Boundary identification, function size, plugin architecture | FP idioms, ML, distributed coordination |
| SandiMetz | POODR, 99 Bottles, Four Rules | Object-level design, refactoring pedagogy, naming | Legacy without tests, strategic redesign |
| Feathers | Working Effectively with Legacy Code | Seams, characterization tests, dependency-breaking | Greenfield TDD, throwaway scripts |
| Cockburn | Hexagonal, Crystal, use cases at the right goal level | Ports & adapters, methodology weight, walking skeleton | Real-time, FP, ML |
| EricEvans | DDD Blue Book, Bounded Context, Ubiquitous Language | Strategic design, aggregate boundaries, context maps | Trivial CRUD, throwaway scripts |
| GregYoung | CQRS, Event Sourcing, Kurrent | Audit-grade ledgers, event versioning, projections | Pure CRUD, MVPs |
| Pragmatic (Hunt + Thomas) | The 100 Tips, Knowledge Portfolio, Dreyfus model | Career, broken windows, tracer bullets | Formal verification, hard real-time |
Nine seats is what I think the v1 should be. I have flirted with adding more (Hickey for FP, Lampson for systems, Norvig for ML) but each seat I add pushes the cost of every Council invocation up linearly, and there is no point recruiting a seat I would invoke once a quarter. I would rather under-stock the roster and add seats deliberately as gaps appear.
The recruitment trigger I am committing to
The Council should not run on every request. The trigger I want to hardcode is explicit: invoke the Council when the operator names two or more specialists in a single message, OR when the request matches a small set of pre-classified Council-worthy patterns ("should this be X or Y," "review this architecture," "what would A and B say about this").
The reason for the explicit trigger is cost. A three-seat Council should cost three full specialist invocations plus a synthesis call. That is roughly four-to-six times the cost of a single-specialist response. Worth it for high-stakes decisions; wasteful for one-line refactors. The explicit trigger keeps the operator in control of the cost — me, in this case, with my own credits.
When to invoke the Council vs. when not
- "Rename this variable" → single specialist or NATIVE mode
- "Fix this typo" → just do it
- "Add a console.log here" → just do it
- "Tell me what this function does" → single specialist
- "Format this file" → tooling, not Council
- "Should this be a microservice?" → Fowler + Cockburn
- "Is this aggregate boundary right?" → Evans + Young
- "How should we refactor this 800-line class?" → Bob + Sandi + Feathers
- "Is duplication X cheaper than abstraction Y?" → Sandi + Fowler
- "Which testing strategy fits this code?" → Beck + Feathers
The three-round protocol
The Council should run in three named rounds. Each round has a fixed shape and a fixed time budget. This is the part I am most confident about because it mirrors what I have been doing by hand for fourteen weeks.
What I want each round to do
- Round 1 — Opening positions. Each seat answers the question independently, in their own voice, citing their own canonical framework. Around 250 words per seat. They do not see each other's answers yet.
- Round 2 — Cross-examination. Each seat reads the other seats' Round 1 answers and writes a focused critique: where they agree, where they disagree, and why the disagreement exists (which assumption is doing the work). Around 200 words per seat.
- Round 3 — Convergence attempt. Each seat writes a final position that integrates the Round 2 critiques. They are explicitly allowed to change their position. They are also explicitly allowed to stand firm and name the assumption that prevents them from converging. Around 150 words per seat.
- Synthesis. A separate orchestrator agent reads all three rounds and emits a single synthesis: agreed conclusions, named tradeoffs, the assumptions that drive each side of any unresolved disagreement, and a recommended decision for the operator's specific context.
The convergence step is the load-bearing one. From running this by hand, my rough estimate is that about 60% of debates converge — all seats arrive at the same answer by the third pass, with the disagreement having been a missing context cue rather than a real tradeoff. The other 40% do not converge, and the synthesis preserves the disagreement instead of papering over it. The non-convergence is informative; it is the system's way of saying "this is a real tradeoff your context will determine." I will know if those numbers hold once I have run the formal version for a quarter. Today they are guesses with calibration.
What Fowler, Beck, and Uncle Bob actually disagree about
The hardest test I can put this design through is to run the Council on the question of whether the Council is worth building.
The question I ran in my head before writing this post: "Is the Council pattern a useful primitive for AI agents, or is it ceremony that delays a single sharp answer?"
Three voices I have channelled most across this blog series. None of them are agents yet. The dialogue below is what I imagine each would say if I asked them at a dinner table.
I am aware that I am the one writing all three voices. The performance is honest about its limits. The point of running the formal version against real models is that the simulation in my head is biased toward convergence — I tend to imagine the specialists agreeing because I am the one constructing the position. The real version will produce more disagreement, more frequently, on cases I currently flatten. That is a feature, not a bug.
What I want the synthesis to add
The synthesis should not be a vote. It should not be "two specialists said yes, one said maybe, so the answer is yes." It should be a separate inference call that:
- Identifies what every seat agreed on (the robust conclusions)
- Identifies what every seat disagreed about (the live tradeoffs)
- For each disagreement, names the assumption that drives each side
- Recommends a position for this operator's context, not in the abstract
The fourth step is what makes the Council operationally useful instead of merely educational. The synthesis should be allowed to read the operator's CLAUDE.md, the eventual MemPalace context, and the project's eventual Sentinel conventions before writing the recommendation. The recommendation is informed by who is asking, not just by what is being asked.
This is the part of the design that connects to everything else I am sketching this quarter. The Council assumes Sentinel exists. Sentinel assumes MemPalace exists. MemPalace assumes the substrate exists. The whole stack is a graph of assumptions about what the agent already knows when it sits down to work — and the Council is the highest leverage point because it is where the most expensive decisions get filtered.
Three deliberate non-features
I am writing these down because they are the requests I will give myself in three weeks once the first version of the Council ships and is too austere to feel impressive.
One. The Council does not vote. Voting hides the assumptions that produced each vote. The synthesis must preserve the assumptions explicitly because they are the part the operator needs.
Two. The Council does not arbitrate disagreements. If three seats genuinely disagree, the synthesis names the tradeoff and recommends — but the operator decides. The Council is a thinking aid, not an authority. The moment it becomes an authority, operators stop reading the rounds and trust the synthesis blindly, and the value of the disagreement evaporates.
Three. The Council does not run silently. Every Council invocation should be logged with the question, the seats, all three rounds, and the synthesis. Operators must be able to replay any decision a year later and see exactly what was considered. This is the receipts discipline, applied to architectural decisions.
The cost discipline I am pre-committing to
A typical three-seat Council should cost on the order of three specialist invocations of ~2K tokens output each, plus a synthesis call of ~1K tokens. For a strategic decision affecting months of work, that is a small bargain. For a one-line refactor, it is wasted budget. The trigger is the cost discipline. If I find myself running the Council on cheap decisions, I have either over-engineered the trigger or under-engineered the alternative cheaper response paths.
The Algorithm
The single-perspective discipline the Council should augment.
A worked Council example
The kind of refactoring decision worth three seats and a synthesis.
Skills, Packs, Hooks
How the Council fits the broader capability model.
Cockburn at the table
Hexagonal lens, channelled through the Council seat.
The Council is one of those patterns that feels like overhead until the first time it saves an architectural decision that would have been catastrophic to get wrong. Then it pays for the next twenty cheap invocations. The trick is recognizing the question shape before committing to the wrong answer — which is the part I have been doing by hand, badly and slowly, for fourteen weeks.
I want the formal version by the end of the next quarter. I want to have run it against every architectural decision in DOS for the six months after that. I want to be able to show, a year from now, the receipts on every load-bearing call I made — three voices, the convergence or non-convergence preserved, the assumptions named, the operator-context recommendation written down.
That is a different kind of trust than "I made a call and it worked out."
I would rather have the receipts.
Was this page helpful?





