27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Feb 24, 2026

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

Anthropic shipped Skills as an open standard in December. The model I want DOS to settle on adds two more layers above and around it — Packs as Bounded-Context bundles of related Skills, and Hooks as the lifecycle layer that fires without invocation. Twenty-four weeks into DOS, with Studio shipping forty-six days ago and the open Skills spec now table stakes across coding-agent surfaces, this is the three-layer architecture I want to commit to before the layering decisions get baked the wrong way.

I am writing this on a Tuesday in the last week of February, twenty-four weeks and roughly two hundred and sixty commits into building DuranteOS. Studio has been live for forty-six days. The Hexagonal migration of the gateway is most of the way done. The hook-pipeline refactoring sequence is about half done.

The layering question — what is the right shape of capability composition above the substrate? — has been forming in my head for two months and is now load-bearing for the next quarter of work. Skills shipped as Anthropic's open standard in December (the W8 dispatch covered the launch). The free-tier release last week made Skills the default capability primitive across Claude Code's user base. Whatever DOS does has to compose with that.

When I describe DOS to engineers familiar with Claude Code, the question that comes back most often is some variant of:

"Why do you need three different capability primitives? Doesn't Claude already have Skills?"

The answer is yes — and the three primitives I want DOS to commit to sit at different layers, with different lifetimes and different ownership semantics. Skills are what the operator invokes. Packs are how Skills are distributed and clustered. Hooks are how the system responds to lifecycle events without being invoked. The three are not competing abstractions; they are stacked.

This is the layered model I want to commit to. It is also the answer to a related question I get less often but more usefully: "if I were going to fork DOS, what would the smallest meaningful piece be?" The three layers each give a different answer, which is part of what I want this essay to make explicit before I lock the layering in code.

The three layers in one sentence each

Skills are operator-invokable workflows (you ask, they run). Packs are versioned, distributable bundles of related capabilities (you install, they ship). Hooks are lifecycle event handlers that fire automatically without invocation (you do not ask, they happen).

The Eric Evans angle: each pack is a Bounded Context

Eric Evans's Bounded Context — the unit of consistent meaning within Domain-Driven Design — is the right frame for understanding why packs should exist as a layer above skills.

A skill on its own (e.g., /memory search "Acme") does one thing. The skill's vocabulary is defined inside its own boundary; it knows what "search" means in the context of the memory layer. It does not need to coordinate with other unrelated skills.

A pack should be a cluster of skills that share a Bounded Context — a consistent vocabulary, a common set of tools, a coherent purpose. The memory pack should contain /memory search, /memory recall, /memory status, /memory mine, etc. — every skill that operates on the operator's knowledge graph. The Sentinel pack (when I ship it) should contain every skill that operates on codebase conventions. The Brand pack should contain every skill that operates on brand identity. Each pack has its own Ubiquitous Language, its own Aggregate boundaries, its own internal tooling.

Layer	Bounded Context size	Vocabulary scope	Distribution
Skill	Single workflow	Workflow-local terms	Embedded in pack
Pack	A coherent capability area	Pack-wide UL	Versioned bundle
Hook	Cross-cutting concern	System-wide events	Live install only

The pack-as-Bounded-Context lens explains decisions that look arbitrary from outside. Why should the memory layer and Sentinel be separate packs even though both deal with "knowledge about the codebase"? Because their vocabularies are different: the memory layer speaks about facts, triples, entities; Sentinel speaks about conventions, patterns, decisions. Forcing them into one pack would corrupt both vocabularies.

The Alistair Cockburn angle: methodology weight per layer

Cockburn's Crystal methodology family has a load-bearing principle: the right weight of process for any system depends on team size and criticality. A two-person prototype gets Crystal Clear; a forty-person life-critical system gets Crystal Magenta. The weight scales with the cost of failure.

The same scaling principle should shape the three DOS layers. Each layer has a different appropriate weight of ceremony.

What weight of ceremony each layer should earn

Same ceremony at every layer (the failure mode)

Skills get the same versioning, the same tests, the same documentation as packs
Packs get the same per-edit testing as hooks
Hooks get the same loose contract as skills
Result: heavy ceremony on the small layer (skills get over-engineered), light ceremony on the big layer (hooks get under-tested)

Weight scales with the cost of failure (the right move)

Skills — light ceremony. Edit, save, invoke immediately. No version. No formal tests (the eval suite covers them). Easy to write, easy to throw away.
Packs — moderate ceremony. Versioned. Tested via the pack's own internal test suite. Documented in the pack's SKILL.md. Distributed via four-copy installation or the eventual marketplace.
Hooks — heavy ceremony. Versioned. Tested with characterization tests (since they fire on every event, regressions are catastrophic). Reviewed before merge. The hook's effect on every code path is considered.

The methodology weight is the right one if asking "what is the worst that happens if this is wrong?" produces an answer that justifies the ceremony. A broken skill is a broken workflow — easy to roll back. A broken pack is a broken capability area — moderately bad. A broken hook fires on every session and corrupts every operator interaction until detected — bad enough to justify substantial ceremony.

What lives in each layer today

The current DOS distribution has a shape that does not yet match the model I am sketching. Today I have roughly twelve packs in various states of readiness — three that I would call production-grade, six that work but have rough edges, three that are scaffolds I have not yet filled in. About eighty workflows total across them, with the long tail being convenience scripts that may or may not earn their place once I rationalize the catalog. Six hooks shipping in production, four more that exist as TypeScript files but are not yet wired in.

The numbers are honest in a way the eventual numbers will not be: most of what I have right now is scaffolding for what the three-layer model is supposed to enable. The whole point of this essay is to commit to the model before the scaffolding crystallizes into the wrong shape.

A real example through all three layers

What should happen when an operator invokes a research skill — across all three layers

Hook layer fires first. SessionStart hook loads the operator's CLAUDE.md, the memory layer's recent context, active project state, and recent learnings. The agent now has a working theory of the operator's situation. Operator did not invoke this; it just happened.
Operator invokes the skill. /research extensive "What changed in Anthropic's API in the last 30 days?" The Skill tool routes to the Research pack's Extensive workflow.
Pack provides the capability. The Research pack contains the workflow definitions, the configured providers (Brave, Perplexity, Claude search), the prompt scaffolding, and the synthesis logic. The skill uses the pack's tools.
Skill orchestrates the work. The Extensive workflow spawns 4 parallel research subagents (Brave, Claude, Perplexity, Gemini), each with a different query variation. Synthesizes the results. Writes the output to MEMORY/RESEARCH.
Hook layer fires again. PostToolUse hook captures the research artifact for the memory layer. SessionEnd hook syncs everything to Studio. Operator does not invoke either; they just happen.

The skill is what the operator interacted with. The pack is what made the skill capable. The hooks are what made the system observant of its own behavior. All three are necessary; none of them are interchangeable.

This worked example runs end-to-end on my machine today. The Research pack is one of the three production-grade ones. Most of the other packs run a partial version of this flow — the hook layer is in place for all of them, but several of the packs do not yet have a coherent Bounded Context internally. That is exactly the gap the next quarter of work is supposed to close.

What each layer is for, more precisely

Where I am in the catalog rationalization

What I want the three-layer model to deliver, by end of Q2:

Metric	Today (rough)	Target end of Q2
Packs	~12 (3 production, 6 rough, 3 scaffold)	~20 (all production-grade)
Skills	~80 (long tail)	~150 (each justifies its place)
Hooks	6 shipping, 4 pending	10 (all characterization-tested)
Avg skill LOC	~180 (markdown + YAML)	unchanged — the layer is supposed to be light
Avg pack LOC	~1,800	unchanged — the discipline is the discipline
Avg hook LOC	~320	unchanged — heavy by design

Today's numbers are aspirational in places (I have not measured them with the rigor the table implies). The end-of-Q2 numbers are commitments. The discipline is to write down both, then let the gap drive the work.

The three things you can fork

The most useful framing for understanding the layers is to ask: what is the smallest meaningful piece of DOS?

There are three answers, one per layer.

One. A single skill. The smallest unit of useful capability — about 180 lines of markdown defining a workflow, plus any tools it imports from its host pack. A new operator can write a custom skill in 2-3 hours and run it inside their existing DOS install.

Two. A single pack. The smallest unit of distributable capability — the pack's whole directory tree, with its skills, tools, agents, and SKILL.md. A pack can be installed standalone into another operator's DOS without dragging in the rest of the ecosystem.

Three. The hook layer. The smallest unit of cross-cutting behavior. An operator could in principle take just the DOS hook layer (the SessionStart/UserPromptSubmit/PostToolUse/SessionEnd handlers) and apply them to a different agent stack to get the observability and memory capture without the skill catalog.

I have not yet seen anyone fork at any of the three layers, since DOS itself is not yet public. But the design supports it — and the three answers together are the working definition of "what DOS is, layered."

Why this layered model justifies itself

The three layers are not just architectural neatness. They reflect three different change rates. Skills change weekly (operator iteration). Packs change monthly (capability evolution). Hooks change quarterly (system-level adjustments). Layering by change rate is what keeps each layer's ceremony appropriate to how often the layer is touched. If I get the change-rate alignment wrong, the rest of the model breaks — heavy ceremony at high change rates produces friction; light ceremony at low change rates produces drift.

The Algorithm

What runs through every layer.

Sentinel — pack example

A pack that produces hook-readable output.

The memory layer — pack + hook

Capability that spans multiple layers.

Four Copies

The distribution mechanism for packs and hooks.

The three-layer model is the clearest summary I have for what DOS should be, structurally. Skills are how operators do work. Packs are how capabilities accumulate. Hooks are how the system stays observant of itself. None of the three could be removed without making DOS materially less useful; none of them is doing the job another layer is supposed to do.

If I had to pick the most underrated of the three, it would be hooks. The hook layer is invisible by design — it does its work without being asked — and exactly because it is invisible, it is the part operators rarely notice and the part I would most miss if it were gone. The most invisible work is sometimes the most important. That is true of hooks. It is true of operating systems generally.

The retrospective on this layered model ships at the end of Q2, when the catalog rationalization is supposed to be done and the numbers in the table above will either match the targets or tell me something I did not expect about which layer compounded faster than the others. Either result is more useful than continuing to operate on a catalog whose shape I have not committed to in writing.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat