Skip to content
DuranteDurante
ALL SYSTEMSGet Access

27 weeks · 54 posts · Written while building

Field notes from a personal AI OS in flight

Every Tuesday, an evergreen essay on what I'm learning while shipping DuranteOS. Every Friday, a dispatch from the week. Roughly 108,000 words and counting — for builders who'd rather watch the foundation get poured than read the press release.

Subscribe · Tuesday essay

Around 3,800 builders read this weekly.

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

Sentinel shipped in late March, fourteen weeks after I sketched it in W8. The first weekend after launch, I ran it across all eleven projects in DOS's project registry. This is the post-launch retrospective — what the convention catalog actually surfaced, where the scanner got it right on first pass, where it found patterns I had not seen in twenty-eight weeks of working in those repos, and where the scan was wrong in ways I did not expect. Sandi Metz on the squint test as the unit of pattern recognition. Michael Feathers on characterization tests as the unit of capture. Both applied to the moment a scanner reads a codebase as the domain expert.

I am writing this on a Tuesday at the end of March, twenty-nine weeks and roughly two hundred and eighty-five commits into building DuranteOS. Studio has been live for eighty-one days. Sentinel — the codebase-convention scanner I sketched in W8 — shipped in production form last Friday, fourteen weeks after the design essay went up. This past weekend I ran it across all eleven projects in DOS's project registry. The retrospective on what it found is this essay.

The W8 essay committed the design before the code. The promise was: the scan should distill 5-15 conventions per project, write them to .sentinel/ and to the project's CLAUDE.md, and act as an information radiator the agent reads at every session start. The non-goals were equally specific: do not enforce, do not catalog every pattern, do not freeze the conventions.

Three days of weekend scanning later, I can say which parts of the design held, which parts surprised me, and which parts were wrong. Writing the retrospective in public — the same week the scanner is fresh enough that I can still remember what each scan felt like — is the discipline that keeps the retro honest. The version I would write in three months will be cleaner; the version I am writing today is more useful because it includes the ugliness.

The retro in one sentence

The shape of the design held. The scanner found 6-14 conventions per project (target was 5-15) at a confidence level that pattern-matched my own intuition about each codebase. The three biggest surprises were not in what the scanner found but in what it missed and why; one of them changed the production design.

The Sandi Metz angle: the squint test as scanner

Sandi Metz's squint test — the trick of half-closing your eyes and letting the shape of code reveal what the literal text obscures — is the closest analog I have for what Sentinel does mechanically. The scanner is not parsing semantics. It is detecting shape across many files, looking for what repeats often enough to be a convention.

The Four Rules from POODR — small classes, single-responsibility methods, fewer instance variables, name what you call — are the kinds of conventions a scanner can look for without understanding what the code does. The scanner reads the directory structure, the import patterns, the file-naming conventions, the test-file pairings, and looks for repeating patterns. What repeats in 80%+ of files becomes a candidate convention. What repeats in 30-80% becomes flagged drift. What repeats in under 30% gets ignored.

Three days into the first weekend's scans, the threshold cuts I committed to in the W8 essay held within two-percentage-point variance per project. The 80% threshold produced a clean list of 6-14 conventions per project. The 30% threshold produced 2-4 drift flags per project — the kind of patterns that look like they want to be conventions but are not yet at canonical levels. The under-30% noise floor stayed quiet.

Metz's lesson the squint test teaches is the one I needed most: the scanner does not have to understand the code; it has to recognize the shape. If the shape repeats, it is a convention. If it does not, it is not yet. The mechanical discipline beats interpretive cleverness every time.

The Feathers angle: characterization tests for conventions

Michael Feathers's characterization test from Working Effectively with Legacy Code is the other shape Sentinel borrows. A characterization test captures what code currently does, even if you suspect what it does is wrong. The point is not to test correctness; the point is to capture the existing behavior so future changes can be scored against it.

Sentinel does this for conventions. The scan captures what the codebase currently looks like — the conventions that emerge from actual practice, not the conventions any document might claim. A CONTRIBUTING.md that says "use service-layer naming for backend modules" is irrelevant if the actual codebase uses six different naming patterns. The Sentinel scan reads the code, not the documentation. What the code currently does is the floor; future changes either match it or don't.

The framing helps me think about what the scanner should not do. It should not advise. It should not score against best practice. It should not refer to external rubrics. It should report: here is what your code currently does, in eight specific patterns, with their prevalence percentages. The agent then has a working theory of the codebase's actual conventions. Whether those conventions are good is a separate conversation — the kind that belongs in the Council pattern, not in a scanner.

What the scan actually found across eleven projects

The numbers from the weekend's run, projects ordered by code volume:

ProjectConventions surfacedDrift flagsTime to scanSurprise (1-5)
Studio (Next.js + Prisma SaaS)14467 sec2
Altyaa (PT-BR SMB SaaS)12351 sec1
DOS itself (Claude Code config + skills)11689 sec4
dos-prisma-saas-kit (the kit fork)11244 sec1
Donne (CRM platform)10338 sec2
AdCore Turbo (Next.js + Prisma)9232 sec1
AxReady (multi-tenant ticketing)9441 sec3
Era Materna (pregnancy SaaS)8329 sec2
The Road to Next (course app)7122 sec1
Exordiom BDR7225 sec2
AXReady (sub-pack scan)6218 sec3

Eleven projects. 104 conventions total. 32 drift flags. Average scan time 41 seconds. Total time across all eleven projects: just under eight minutes.

The "surprise" column is what I want to write about for the rest of this essay. I scored each scan on a 1-5 scale of how much the convention catalog surprised me — 1 being "this is exactly what I would have written by hand," 5 being "the scanner found patterns I did not know were there."

The three biggest surprises

Three things the first scan surfaced that I did not expect

  1. DOS itself scored a 4. I have written DOS for twenty-nine weeks. I would have predicted the scan would tell me nothing new about my own codebase. It told me three things: (a) my hook-loader files cluster around a four-method shape (name, slot, phase, load) that I never explicitly designed but that emerged after the refactoring sequence I described in W17 — Sentinel codified an implicit interface that I had been treating as informal. (b) Skill files have a stable five-section structure I follow without thinking; the scanner named it. (c) My Council seat agents share a verbatim prologue I had been pasting between files; Sentinel flagged the duplication as a convention candidate. The discipline of the squint test surfaced patterns I was too close to see.
  2. AxReady scored a 3 because it had two competing conventions on the same axis. The multi-tenant ticketing app uses two different patterns for tenant scoping (one in apps/web, one in apps/admin). The scanner correctly flagged both as 50% drift, with neither at convention threshold. I had not noticed the split. The scan made it visible. This is exactly the Broken Windows signal from W13 — small inconsistencies that cascade if not surfaced — caught by the scanner before I tripped on it.
  3. The DOS scan also surfaced a pattern that turned out to be wrong. Sentinel surfaced "Bun-only execution; no Node fallback" as an 80%+ convention. That is true in my workflow. It is not true in the codebase: a contributor who clones the repo and runs npm instead of bun on the wrong file would hit non-Bun-compatible code paths. The convention is what I do, not what the code enforces. Sentinel cannot tell the difference yet — the scanner reads what is there, but cannot distinguish "what the operator does" from "what the code requires." The scan was technically correct and operationally misleading.

The third surprise is the one that changed Sentinel's production design. The scanner now distinguishes between operator conventions (patterns visible only when you watch the operator work) and code conventions (patterns visible to any contributor reading the code). The 80% threshold applies separately to each. The output catalog labels them. The change took half a day; the lesson cost more than that.

What the scan got right that I expected

Five things the scan caught cleanly across most projects, in order of robustness.

One. Stack identification. Every project's stack — language, framework, ORM, build system, deployment target — was identified correctly. This is the table-stakes work; the scanner reads package.json, tsconfig.json, tailwind.config, and produces the one-paragraph stack summary the W8 essay specified. Eleven of eleven projects passed.

Two. Convention extraction at 80%+ prevalence. Where a pattern was genuinely canonical, the scan named it. Tenant-root entity in multi-tenancy projects. UUID primary keys. Service-layer naming. Hook-pipeline shapes. Skill-file structures. The scanner was not creative — it found what was there.

Three. Drift flagging at 30-80%. Genuine drift was correctly identified as drift, not as convention. The scan did not over-promote half-formed patterns into the canonical list. The threshold cuts held.

Four. Per-project Ubiquitous Language extraction. The Eric Evans frame from W8 — distilling the project's actual vocabulary — produced specific, project-flavored output: "Organization is the multi-tenant root" for Studio, "Reading session" for Era Materna, "Coach" for Altyaa. Each project's own vocabulary surfaced in its own catalog.

Five. Stale-content flagging. Each catalog is dated. The agent reads the date during session start and surfaces a banner if the scan is more than thirty days old. The mechanism worked on first ship — I do not know yet how often I will see the staleness banner because I just ran the scans, but the plumbing is there.

What the scan got wrong (besides the operator-vs-code-conventions issue)

Three real failures that I am noting publicly because pretending the scanner was perfect on first pass is dishonest.

What this implies if you are running scans across multiple projects

Three suggestions, each costly enough that I considered cutting them and was wrong each time.

One. Score your own scans on the surprise scale. The numbers tell you whether the scanner is working: 1 means the scanner is reproducing what you already know (low value); 4-5 means the scanner is finding patterns you did not see (high value). DOS at 4 was the highest-value scan of the weekend; the others at 1-2 were confirmations.

Two. Distinguish operator conventions from code conventions explicitly. The third surprise above cost me half a day of confused output. Do not let your scanner conflate "what the operator does" with "what the code enforces." They are different categories with different audiences.

Three. Run the scan, then read the catalog as if you have never seen the code. The squint-test discipline only works if you can put your own knowledge aside long enough to see what the scanner saw. If the catalog feels obvious, you are reading it through your own assumptions; if it feels like reading someone else's notes about your code, the scanner is doing its job.

What the scan does to the agent's behavior

The next session in any of these eleven projects, the agent reads the project's CLAUDE.md and the .sentinel/ directory at session start. It now knows the conventions. The first session-after-scan I ran in Donne (a project I had not opened in three weeks), the agent's first message back to me referenced two conventions correctly without my prompting. That is the information-radiator effect from the W8 essay, finally visible in production.

The version of Sentinel that shipped last week is not the version I will be running at the end of Q2. The mtime-weighting fix lands this week. The cross-project aggregation lands the week after. The drift-resolution Council workflow lands somewhere in May. The retrospective on the post-Q2 version of Sentinel ships at the end of June, after another quarter of running scans across the same eleven projects.

What this first weekend already proved: the W8 design held. The thresholds held. The information-radiator promise held. The three failures I named above are correctable in days, not quarters. That is what shipping the design before the code buys you — when the code lands, the failures are known unknowns you can fix on a Tuesday, not unknown unknowns that surface on a customer call.

I would rather have the receipts. I have them now.

Was this page helpful?

The 27-week arc · A single body of work

Twenty-seven weeks. Two posts a week. Six months of writing while building.

Week

Tuesday evergreen

Friday dispatch