27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Dec 16, 2025

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

I have not built Sentinel yet. I am writing this before I do, because if I do not commit the design to the page now, I will discover the wrong shape of it the hard way once the agent has started inducting bad patterns across eleven projects. This is the architecture I want to commit to — Eric Evans's Ubiquitous Language and Alistair Cockburn's information radiator, applied to the moment a new AI agent meets an unfamiliar codebase.

I am writing this on a Monday in mid-December, fourteen weeks and roughly two hundred commits into building DuranteOS. I have not built Sentinel yet. The pack does not exist. The artifact directory does not exist. The CLAUDE.md section it would write does not exist.

I am writing the design essay before the code on purpose. Every week I am running the same disappointing experiment with the same disappointing result, and the only way to stop running it is to write down what I think the cure looks like before I am too tired to be careful about its shape.

The experiment is this. I open a fresh project — one of the eleven repos DOS now has a working theory about. I ask the agent to make a small change. The agent reads three or four files at random, finds something that looks like the prevailing pattern, and writes more of it. Two days later I discover the agent's "prevailing pattern" was actually the one legacy file nobody had ever refactored, and the agent has now propagated the legacy shape across four new files.

This is not a model-quality problem. It is an induction problem. The agent generalized from too small a sample, with no signal about which sample was canonical and which was historical. And it will keep generalizing from too small a sample, on every single first request to a new repo, forever, unless I build something that arrives before the request does.

The thing I want to build is Sentinel. This essay is what I want it to be before I write a line of it.

What I want Sentinel to do, in one sentence

Sentinel should scan a codebase, distill its stack and 5-15 architectural conventions, write them into a project-local .sentinel/ directory and the project's CLAUDE.md, and ensure every subsequent agent session starts with a working theory of the project rather than a blank one.

The Eric Evans angle: Ubiquitous Language is something you discover, not invent

Eric Evans's central methodological move in Domain-Driven Design — the move that I keep returning to in my own notes — is knowledge crunching. The practice of working alongside domain experts until the team's working vocabulary lines up with the actual structure of the domain. The artifact Evans values most is not the diagram or the schema. It is the Ubiquitous Language. A small set of nouns, verbs, and relationships that everyone on the team uses with a single shared meaning.

Most codebases I touch have a Ubiquitous Language. They just rarely have it written down anywhere. The vocabulary lives in commit messages, in slack threads, in the head of the senior engineer who has been there four years. When a new contributor arrives — human or AI — they have to reverse-engineer it from the code itself, which is slow, error-prone, and produces inconsistent inductions.

What I want Sentinel to do is to act as the knowledge-crunching session that should have happened. Read the codebase as the domain expert. Distill the vocabulary. Write it down where the agent will see it next session.

The piece I am most worried about getting wrong, in advance, is the distillation step. Evans warns over and over against premature naming and premature reification, and I do not yet know how to instruct a scanner to refuse to name something that has not stabilized. My current best guess is that Sentinel should distinguish between three classes of pattern by frequency: anything appearing in 80%+ of relevant files becomes a convention; anything in 30-80% becomes flagged drift, not convention; anything below 30% gets ignored. The 80% threshold is a guess. I will probably move it after the first three projects.

The shape of the smell I want to remove

The agent without Sentinel — what I am living with now

Reads 3-4 files at random when asked to do work
Generalizes from whichever sample it happened to read
Cannot distinguish canonical patterns from historical drift
Inducts a "house style" that may match the worst file in the repo
Re-inducts on every new session — no accumulation

The agent with Sentinel — what I want fourteen weeks from now

Reads CLAUDE.md (always loaded) which contains the distilled stack and 12 conventions
Has the project's actual Ubiquitous Language written down
Knows the canonical patterns by name; recognizes drift when it sees it
Inducts only what is genuinely novel; references the convention catalog for the rest
Conventions accumulate across sessions; re-scans catch new patterns

The Alistair Cockburn angle: Sentinel must be an information radiator, not a status report

Alistair Cockburn's Crystal Clear introduces the term information radiator — an artifact you can absorb at a glance because it is positioned where you cannot avoid it. The classic example is the team task board on the wall of the office. The opposite is the status report nobody reads.

This is the part of the design I am most confident in. Whatever Sentinel produces, it must end up in CLAUDE.md, which the harness already loads automatically into every session's context. Not in a wiki. Not in a doc-site. Not in a registry the agent has to query. In CLAUDE.md, which is in the agent's mouth before the agent has been asked anything.

The cost of consulting it must be zero. The cost of not consulting it must be deliberate — the agent would have to actively ignore information that is already in front of it. That asymmetry is the whole game. Information radiators work because not consulting them is more effortful than consulting them. Status reports fail because the reverse is true.

Cockburn's other contribution that I want to lean on, and that I am suspicious I will fail to lean on, is the methodological humility of the Crystal family — the right methodology for a project depends on team size and criticality. Sentinel should be deliberately lightweight. A single command. A small directory of artifacts. A CLAUDE.md edit. Not Architecture Decision Records. Not RFCs. Not a wiki. Those exist for human readers and have their place. Sentinel is for the agent, and the agent reads what is in front of it. If I let the design grow, it will quietly become a system of record, and a system of record is a thing humans maintain, and the moment humans maintain it, it goes stale and the agent starts trusting stale data.

That is the failure mode I am writing this essay to avoid in advance.

The five phases I think Sentinel should run

The scan should run in five phases. Each phase emits a distinct artifact. The whole scan should take under ninety seconds for a medium-sized codebase. If it takes longer, I have done it wrong.

Rendering diagram…

What each phase should produce

Stack discovery. Walk package.json, requirements.txt, go.mod, Cargo.toml, etc. Identify the language, framework, ORM, build system, deployment target. Output: a one-paragraph stack summary.
Convention extraction. Crawl source files looking for repeated patterns: directory structure, naming conventions, import patterns, common idioms. Cluster the patterns. Pick the 5-15 most prevalent and meaningful. Output: a numbered list of conventions, each with a one-line description.
Architectural decisions. Identify the load-bearing decisions implied by the conventions: "Better Auth over NextAuth," "Hooks over middleware," "Bun over Node for tooling." These should be inferred from structural choices, not from explicit ADRs (which often do not exist in young projects, and certainly do not exist in mine). Output: a numbered list of decisions with one-line justifications.
CLAUDE.md update. Append the stack summary, convention list, and decision list to the project's CLAUDE.md under a ## Sentinel Conventions section. If the section already exists, replace it. Preserve everything else.
Project registry sync. Update a single canonical file — ~/Durante/Tools/.dos-projects.json is what I have in mind — with the project's slug, path, and a summary. This is the registry of all projects DOS is aware of. Right now I maintain it by hand. By next quarter it should maintain itself.

What I think a real Sentinel output should look like

I have been hand-writing what I think Sentinel's output should look like, on the projects I have been onboarding the agent to. Here is what I sketched for the SaaS platform I am about to start building (the one that will host the credit-metered gateway). I have written this section by hand, on purpose, because I want to know what good looks like before I ask a scanner to produce it.

## Sentinel Conventions

### Stack
Monorepo SaaS with API Gateway Pattern.
Next.js + Prisma + PostgreSQL + Better Auth + Stripe.
~40 models, ~45 API endpoints, ~12 AI/media providers planned.

### Conventions
- Monorepo: Turborepo with @kit/* packages, apps/* workspaces
- Multi-tenancy: Organization root entity, cascade deletes,
  composite unique keys
- Gateway: REST at /api/v1/{service}/{provider}/ with Bearer token
  + credit metering
- Billing: Double-entry credit ledger (reserve→commit/release).
  SpendCap enforcement pre-request
- Data Loading: React cache() + server-only imports + explicit
  Prisma select clauses
- Auth: Better Auth catch-all at /api/auth/[...all], server actions
  via next-safe-action
- IDs: UUID primary keys throughout. No auto-increment.

### Key Decisions
- Better Auth over NextAuth/Clerk — simpler API, built-in multi-tenancy
- API Gateway pattern — all AI calls route through Studio for metering
- Credit-ledger over pay-per-call — enables spend caps, refunds,
  partner tiers
- Prisma + PostgreSQL — type-safe ORM, hosted production

The output is short. It has to be. CLAUDE.md will be loaded into every session's context, and context is finite. The discipline I want Sentinel to enforce is to capture only what would change the agent's behavior. Anything an agent would do correctly without the convention does not belong in the convention list. That is the rule I have to write down, because the moment I let exceptions creep in, the artifact becomes a wiki, and the wiki becomes stale, and the agent learns to distrust it.

Three non-goals I am committing to in advance

I am writing these down because they are the requests I will give myself in three weeks, when the first version of Sentinel ships and is too short to feel impressive.

What this implies if you onboard agents to your codebase

Three suggestions, in order of how cheaply they pay back. These are what I would tell another indie founder writing the same scanner from scratch.

One. Write down your conventions where the agent will see them by default. Most teams have implicit conventions and explicit CONTRIBUTING.md files. The agent rarely reads CONTRIBUTING.md unless told to. It always reads CLAUDE.md (or its equivalent). Move the conventions to where the agent reads. This is the cheapest possible win and you can do it before you write a single line of scanner code.

Two. Distinguish stack from conventions from decisions. Stack is what you use. Conventions are how you use it. Decisions are why you chose this and not alternatives. The agent benefits from all three but in different ways: stack tells it what idioms exist, conventions tell it which idioms to prefer, decisions tell it which alternatives to not propose. Most onboarding documents conflate the three. The agent then conflates the three. Separate them on the page and the agent separates them in its head.

Three. Re-scan on a cadence, even if you are scanning by hand. The single most common failure mode of a written-down convention is one that ran six months ago and now lies. The artifact is more dangerous when stale than when missing. Either re-scan regularly or have the artifact self-flag its age.

What I am committing to publicly

By the end of January I will have a working Sentinel pack on at least one project. By end of February it will run across all eleven projects DOS knows about. The total time cost should be under fifteen minutes per week of operator attention. If it is more, I have over-engineered it.

The Algorithm

What I want to run after Sentinel sets up the working theory.

Skills, Packs, Hooks

The substrate Sentinel will compose with.

Multi-copy invariants

The same mechanization principle, different layer.

MemPalace

The other knowledge surface I want the agent to read.

The agent without Sentinel is a smart contractor who walks into your codebase on day one and starts typing. The agent with Sentinel — once I build it — should be the same contractor after they have spent an afternoon reading. The work is identical. The trust profile is not.

I am writing this essay before the code because the kind of mechanized discipline I am sketching is exactly the kind that gets skipped until you have the scar tissue to insist on it.

I have the scar tissue. I do not yet have the pack. By next quarter I should have both.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat