27 weeks · 54 posts · Written while building

Field notes from a personal AI OS in flight

Every Tuesday, an evergreen essay on what I'm learning while shipping DuranteOS. Every Friday, a dispatch from the week. Roughly 108,000 words and counting — for builders who'd rather watch the foundation get poured than read the press release.

Start here

If you're thinking about writing while building

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

If you're considering the founding cohort

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

If you want the architectural spine

The Decomposition Discipline I Am Trying to Codify Inside DOS

Subscribe · Tuesday essay

Around 3,800 builders read this weekly.

Mar 31, 2026

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

Sentinel shipped in late March, fourteen weeks after I sketched it in W8. The first weekend after launch, I ran it across all eleven projects in DOS's project registry. This is the post-launch retrospective — what the convention catalog actually surfaced, where the scanner got it right on first pass, where it found patterns I had not seen in twenty-eight weeks of working in those repos, and where the scan was wrong in ways I did not expect. Sandi Metz on the squint test as the unit of pattern recognition. Michael Feathers on characterization tests as the unit of capture. Both applied to the moment a scanner reads a codebase as the domain expert.

I am writing this on a Tuesday at the end of March, twenty-nine weeks and roughly two hundred and eighty-five commits into building DuranteOS. Studio has been live for eighty-one days. Sentinel — the codebase-convention scanner I sketched in W8 — shipped in production form last Friday, fourteen weeks after the design essay went up. This past weekend I ran it across all eleven projects in DOS's project registry. The retrospective on what it found is this essay.

The W8 essay committed the design before the code. The promise was: the scan should distill 5-15 conventions per project, write them to .sentinel/ and to the project's CLAUDE.md, and act as an information radiator the agent reads at every session start. The non-goals were equally specific: do not enforce, do not catalog every pattern, do not freeze the conventions.

Three days of weekend scanning later, I can say which parts of the design held, which parts surprised me, and which parts were wrong. Writing the retrospective in public — the same week the scanner is fresh enough that I can still remember what each scan felt like — is the discipline that keeps the retro honest. The version I would write in three months will be cleaner; the version I am writing today is more useful because it includes the ugliness.

The retro in one sentence

The shape of the design held. The scanner found 6-14 conventions per project (target was 5-15) at a confidence level that pattern-matched my own intuition about each codebase. The three biggest surprises were not in what the scanner found but in what it missed and why; one of them changed the production design.

The Sandi Metz angle: the squint test as scanner

Sandi Metz's squint test — the trick of half-closing your eyes and letting the shape of code reveal what the literal text obscures — is the closest analog I have for what Sentinel does mechanically. The scanner is not parsing semantics. It is detecting shape across many files, looking for what repeats often enough to be a convention.

The Four Rules from POODR — small classes, single-responsibility methods, fewer instance variables, name what you call — are the kinds of conventions a scanner can look for without understanding what the code does. The scanner reads the directory structure, the import patterns, the file-naming conventions, the test-file pairings, and looks for repeating patterns. What repeats in 80%+ of files becomes a candidate convention. What repeats in 30-80% becomes flagged drift. What repeats in under 30% gets ignored.

Three days into the first weekend's scans, the threshold cuts I committed to in the W8 essay held within two-percentage-point variance per project. The 80% threshold produced a clean list of 6-14 conventions per project. The 30% threshold produced 2-4 drift flags per project — the kind of patterns that look like they want to be conventions but are not yet at canonical levels. The under-30% noise floor stayed quiet.

Metz's lesson the squint test teaches is the one I needed most: the scanner does not have to understand the code; it has to recognize the shape. If the shape repeats, it is a convention. If it does not, it is not yet. The mechanical discipline beats interpretive cleverness every time.

The Feathers angle: characterization tests for conventions

Michael Feathers's characterization test from Working Effectively with Legacy Code is the other shape Sentinel borrows. A characterization test captures what code currently does, even if you suspect what it does is wrong. The point is not to test correctness; the point is to capture the existing behavior so future changes can be scored against it.

Sentinel does this for conventions. The scan captures what the codebase currently looks like — the conventions that emerge from actual practice, not the conventions any document might claim. A CONTRIBUTING.md that says "use service-layer naming for backend modules" is irrelevant if the actual codebase uses six different naming patterns. The Sentinel scan reads the code, not the documentation. What the code currently does is the floor; future changes either match it or don't.

The framing helps me think about what the scanner should not do. It should not advise. It should not score against best practice. It should not refer to external rubrics. It should report: here is what your code currently does, in eight specific patterns, with their prevalence percentages. The agent then has a working theory of the codebase's actual conventions. Whether those conventions are good is a separate conversation — the kind that belongs in the Council pattern, not in a scanner.

What the scan actually found across eleven projects

The numbers from the weekend's run, projects ordered by code volume:

Project	Conventions surfaced	Drift flags	Time to scan	Surprise (1-5)
Studio (Next.js + Prisma SaaS)	14	4	67 sec	2
Altyaa (PT-BR SMB SaaS)	12	3	51 sec	1
DOS itself (Claude Code config + skills)	11	6	89 sec	4
dos-prisma-saas-kit (the kit fork)	11	2	44 sec	1
Donne (CRM platform)	10	3	38 sec	2
AdCore Turbo (Next.js + Prisma)	9	2	32 sec	1
AxReady (multi-tenant ticketing)	9	4	41 sec	3
Era Materna (pregnancy SaaS)	8	3	29 sec	2
The Road to Next (course app)	7	1	22 sec	1
Exordiom BDR	7	2	25 sec	2
AXReady (sub-pack scan)	6	2	18 sec	3

Eleven projects. 104 conventions total. 32 drift flags. Average scan time 41 seconds. Total time across all eleven projects: just under eight minutes.

The "surprise" column is what I want to write about for the rest of this essay. I scored each scan on a 1-5 scale of how much the convention catalog surprised me — 1 being "this is exactly what I would have written by hand," 5 being "the scanner found patterns I did not know were there."

The three biggest surprises

Three things the first scan surfaced that I did not expect

DOS itself scored a 4. I have written DOS for twenty-nine weeks. I would have predicted the scan would tell me nothing new about my own codebase. It told me three things: (a) my hook-loader files cluster around a four-method shape (name, slot, phase, load) that I never explicitly designed but that emerged after the refactoring sequence I described in W17 — Sentinel codified an implicit interface that I had been treating as informal. (b) Skill files have a stable five-section structure I follow without thinking; the scanner named it. (c) My Council seat agents share a verbatim prologue I had been pasting between files; Sentinel flagged the duplication as a convention candidate. The discipline of the squint test surfaced patterns I was too close to see.
AxReady scored a 3 because it had two competing conventions on the same axis. The multi-tenant ticketing app uses two different patterns for tenant scoping (one in apps/web, one in apps/admin). The scanner correctly flagged both as 50% drift, with neither at convention threshold. I had not noticed the split. The scan made it visible. This is exactly the Broken Windows signal from W13 — small inconsistencies that cascade if not surfaced — caught by the scanner before I tripped on it.
The DOS scan also surfaced a pattern that turned out to be wrong. Sentinel surfaced "Bun-only execution; no Node fallback" as an 80%+ convention. That is true in my workflow. It is not true in the codebase: a contributor who clones the repo and runs npm instead of bun on the wrong file would hit non-Bun-compatible code paths. The convention is what I do, not what the code enforces. Sentinel cannot tell the difference yet — the scanner reads what is there, but cannot distinguish "what the operator does" from "what the code requires." The scan was technically correct and operationally misleading.

The third surprise is the one that changed Sentinel's production design. The scanner now distinguishes between operator conventions (patterns visible only when you watch the operator work) and code conventions (patterns visible to any contributor reading the code). The 80% threshold applies separately to each. The output catalog labels them. The change took half a day; the lesson cost more than that.

What the scan got right that I expected

Five things the scan caught cleanly across most projects, in order of robustness.

One. Stack identification. Every project's stack — language, framework, ORM, build system, deployment target — was identified correctly. This is the table-stakes work; the scanner reads package.json, tsconfig.json, tailwind.config, and produces the one-paragraph stack summary the W8 essay specified. Eleven of eleven projects passed.

Two. Convention extraction at 80%+ prevalence. Where a pattern was genuinely canonical, the scan named it. Tenant-root entity in multi-tenancy projects. UUID primary keys. Service-layer naming. Hook-pipeline shapes. Skill-file structures. The scanner was not creative — it found what was there.

Three. Drift flagging at 30-80%. Genuine drift was correctly identified as drift, not as convention. The scan did not over-promote half-formed patterns into the canonical list. The threshold cuts held.

Four. Per-project Ubiquitous Language extraction. The Eric Evans frame from W8 — distilling the project's actual vocabulary — produced specific, project-flavored output: "Organization is the multi-tenant root" for Studio, "Reading session" for Era Materna, "Coach" for Altyaa. Each project's own vocabulary surfaced in its own catalog.

Five. Stale-content flagging. Each catalog is dated. The agent reads the date during session start and surfaces a banner if the scan is more than thirty days old. The mechanism worked on first ship — I do not know yet how often I will see the staleness banner because I just ran the scans, but the plumbing is there.

What the scan got wrong (besides the operator-vs-code-conventions issue)

Three real failures that I am noting publicly because pretending the scanner was perfect on first pass is dishonest.

What this implies if you are running scans across multiple projects

Three suggestions, each costly enough that I considered cutting them and was wrong each time.

One. Score your own scans on the surprise scale. The numbers tell you whether the scanner is working: 1 means the scanner is reproducing what you already know (low value); 4-5 means the scanner is finding patterns you did not see (high value). DOS at 4 was the highest-value scan of the weekend; the others at 1-2 were confirmations.

Two. Distinguish operator conventions from code conventions explicitly. The third surprise above cost me half a day of confused output. Do not let your scanner conflate "what the operator does" with "what the code enforces." They are different categories with different audiences.

Three. Run the scan, then read the catalog as if you have never seen the code. The squint-test discipline only works if you can put your own knowledge aside long enough to see what the scanner saw. If the catalog feels obvious, you are reading it through your own assumptions; if it feels like reading someone else's notes about your code, the scanner is doing its job.

What the scan does to the agent's behavior

The next session in any of these eleven projects, the agent reads the project's CLAUDE.md and the .sentinel/ directory at session start. It now knows the conventions. The first session-after-scan I ran in Donne (a project I had not opened in three weeks), the agent's first message back to me referenced two conventions correctly without my prompting. That is the information-radiator effect from the W8 essay, finally visible in production.

The W8 sketch

The design essay this scan retrospective tests.

The three-layer model

Where Sentinel sits as a Pack.

MCP boundary

Sentinel ships as an MCP server in this same week.

The natural next step

Council-driven drift resolution.

The version of Sentinel that shipped last week is not the version I will be running at the end of Q2. The mtime-weighting fix lands this week. The cross-project aggregation lands the week after. The drift-resolution Council workflow lands somewhere in May. The retrospective on the post-Q2 version of Sentinel ships at the end of June, after another quarter of running scans across the same eleven projects.

What this first weekend already proved: the W8 design held. The thresholds held. The information-radiator promise held. The three failures I named above are correctable in days, not quarters. That is what shipping the design before the code buys you — when the code lands, the failures are known unknowns you can fix on a Tuesday, not unknown unknowns that surface on a customer call.

I would rather have the receipts. I have them now.

Was this page helpful?

The 27-week arc · A single body of work

Twenty-seven weeks. Two posts a week. Six months of writing while building.

Week

Tuesday evergreen

Friday dispatch

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat