Pular para o conteúdo
DuranteDurante
ALL SYSTEMSGet Access

27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

Anthropic shipped Skills as an open standard in December. The model I want DOS to settle on adds two more layers above and around it — Packs as Bounded-Context bundles of related Skills, and Hooks as the lifecycle layer that fires without invocation. Twenty-four weeks into DOS, with Studio shipping forty-six days ago and the open Skills spec now table stakes across coding-agent surfaces, this is the three-layer architecture I want to commit to before the layering decisions get baked the wrong way.

I am writing this on a Tuesday in the last week of February, twenty-four weeks and roughly two hundred and sixty commits into building DuranteOS. Studio has been live for forty-six days. The Hexagonal migration of the gateway is most of the way done. The hook-pipeline refactoring sequence is about half done.

The layering question — what is the right shape of capability composition above the substrate? — has been forming in my head for two months and is now load-bearing for the next quarter of work. Skills shipped as Anthropic's open standard in December (the W8 dispatch covered the launch). The free-tier release last week made Skills the default capability primitive across Claude Code's user base. Whatever DOS does has to compose with that.

When I describe DOS to engineers familiar with Claude Code, the question that comes back most often is some variant of:

"Why do you need three different capability primitives? Doesn't Claude already have Skills?"

The answer is yes — and the three primitives I want DOS to commit to sit at different layers, with different lifetimes and different ownership semantics. Skills are what the operator invokes. Packs are how Skills are distributed and clustered. Hooks are how the system responds to lifecycle events without being invoked. The three are not competing abstractions; they are stacked.

This is the layered model I want to commit to. It is also the answer to a related question I get less often but more usefully: "if I were going to fork DOS, what would the smallest meaningful piece be?" The three layers each give a different answer, which is part of what I want this essay to make explicit before I lock the layering in code.

The three layers in one sentence each

Skills are operator-invokable workflows (you ask, they run). Packs are versioned, distributable bundles of related capabilities (you install, they ship). Hooks are lifecycle event handlers that fire automatically without invocation (you do not ask, they happen).

The Eric Evans angle: each pack is a Bounded Context

Eric Evans's Bounded Context — the unit of consistent meaning within Domain-Driven Design — is the right frame for understanding why packs should exist as a layer above skills.

A skill on its own (e.g., /memory search "Acme") does one thing. The skill's vocabulary is defined inside its own boundary; it knows what "search" means in the context of the memory layer. It does not need to coordinate with other unrelated skills.

A pack should be a cluster of skills that share a Bounded Context — a consistent vocabulary, a common set of tools, a coherent purpose. The memory pack should contain /memory search, /memory recall, /memory status, /memory mine, etc. — every skill that operates on the operator's knowledge graph. The Sentinel pack (when I ship it) should contain every skill that operates on codebase conventions. The Brand pack should contain every skill that operates on brand identity. Each pack has its own Ubiquitous Language, its own Aggregate boundaries, its own internal tooling.

LayerBounded Context sizeVocabulary scopeDistribution
SkillSingle workflowWorkflow-local termsEmbedded in pack
PackA coherent capability areaPack-wide ULVersioned bundle
HookCross-cutting concernSystem-wide eventsLive install only

The pack-as-Bounded-Context lens explains decisions that look arbitrary from outside. Why should the memory layer and Sentinel be separate packs even though both deal with "knowledge about the codebase"? Because their vocabularies are different: the memory layer speaks about facts, triples, entities; Sentinel speaks about conventions, patterns, decisions. Forcing them into one pack would corrupt both vocabularies.

The Alistair Cockburn angle: methodology weight per layer

Cockburn's Crystal methodology family has a load-bearing principle: the right weight of process for any system depends on team size and criticality. A two-person prototype gets Crystal Clear; a forty-person life-critical system gets Crystal Magenta. The weight scales with the cost of failure.

The same scaling principle should shape the three DOS layers. Each layer has a different appropriate weight of ceremony.

What weight of ceremony each layer should earn

Same ceremony at every layer (the failure mode)
  • Skills get the same versioning, the same tests, the same documentation as packs
  • Packs get the same per-edit testing as hooks
  • Hooks get the same loose contract as skills
  • Result: heavy ceremony on the small layer (skills get over-engineered), light ceremony on the big layer (hooks get under-tested)
Weight scales with the cost of failure (the right move)
  • Skills — light ceremony. Edit, save, invoke immediately. No version. No formal tests (the eval suite covers them). Easy to write, easy to throw away.
  • Packs — moderate ceremony. Versioned. Tested via the pack's own internal test suite. Documented in the pack's SKILL.md. Distributed via four-copy installation or the eventual marketplace.
  • Hooks — heavy ceremony. Versioned. Tested with characterization tests (since they fire on every event, regressions are catastrophic). Reviewed before merge. The hook's effect on every code path is considered.

The methodology weight is the right one if asking "what is the worst that happens if this is wrong?" produces an answer that justifies the ceremony. A broken skill is a broken workflow — easy to roll back. A broken pack is a broken capability area — moderately bad. A broken hook fires on every session and corrupts every operator interaction until detected — bad enough to justify substantial ceremony.

What lives in each layer today

The current DOS distribution has a shape that does not yet match the model I am sketching. Today I have roughly twelve packs in various states of readiness — three that I would call production-grade, six that work but have rough edges, three that are scaffolds I have not yet filled in. About eighty workflows total across them, with the long tail being convenience scripts that may or may not earn their place once I rationalize the catalog. Six hooks shipping in production, four more that exist as TypeScript files but are not yet wired in.

The numbers are honest in a way the eventual numbers will not be: most of what I have right now is scaffolding for what the three-layer model is supposed to enable. The whole point of this essay is to commit to the model before the scaffolding crystallizes into the wrong shape.

A real example through all three layers

What should happen when an operator invokes a research skill — across all three layers

  1. Hook layer fires first. SessionStart hook loads the operator's CLAUDE.md, the memory layer's recent context, active project state, and recent learnings. The agent now has a working theory of the operator's situation. Operator did not invoke this; it just happened.
  2. Operator invokes the skill. /research extensive "What changed in Anthropic's API in the last 30 days?" The Skill tool routes to the Research pack's Extensive workflow.
  3. Pack provides the capability. The Research pack contains the workflow definitions, the configured providers (Brave, Perplexity, Claude search), the prompt scaffolding, and the synthesis logic. The skill uses the pack's tools.
  4. Skill orchestrates the work. The Extensive workflow spawns 4 parallel research subagents (Brave, Claude, Perplexity, Gemini), each with a different query variation. Synthesizes the results. Writes the output to MEMORY/RESEARCH.
  5. Hook layer fires again. PostToolUse hook captures the research artifact for the memory layer. SessionEnd hook syncs everything to Studio. Operator does not invoke either; they just happen.

The skill is what the operator interacted with. The pack is what made the skill capable. The hooks are what made the system observant of its own behavior. All three are necessary; none of them are interchangeable.

This worked example runs end-to-end on my machine today. The Research pack is one of the three production-grade ones. Most of the other packs run a partial version of this flow — the hook layer is in place for all of them, but several of the packs do not yet have a coherent Bounded Context internally. That is exactly the gap the next quarter of work is supposed to close.

What each layer is for, more precisely

Where I am in the catalog rationalization

What I want the three-layer model to deliver, by end of Q2:

MetricToday (rough)Target end of Q2
Packs~12 (3 production, 6 rough, 3 scaffold)~20 (all production-grade)
Skills~80 (long tail)~150 (each justifies its place)
Hooks6 shipping, 4 pending10 (all characterization-tested)
Avg skill LOC~180 (markdown + YAML)unchanged — the layer is supposed to be light
Avg pack LOC~1,800unchanged — the discipline is the discipline
Avg hook LOC~320unchanged — heavy by design

Today's numbers are aspirational in places (I have not measured them with the rigor the table implies). The end-of-Q2 numbers are commitments. The discipline is to write down both, then let the gap drive the work.

The three things you can fork

The most useful framing for understanding the layers is to ask: what is the smallest meaningful piece of DOS?

There are three answers, one per layer.

One. A single skill. The smallest unit of useful capability — about 180 lines of markdown defining a workflow, plus any tools it imports from its host pack. A new operator can write a custom skill in 2-3 hours and run it inside their existing DOS install.

Two. A single pack. The smallest unit of distributable capability — the pack's whole directory tree, with its skills, tools, agents, and SKILL.md. A pack can be installed standalone into another operator's DOS without dragging in the rest of the ecosystem.

Three. The hook layer. The smallest unit of cross-cutting behavior. An operator could in principle take just the DOS hook layer (the SessionStart/UserPromptSubmit/PostToolUse/SessionEnd handlers) and apply them to a different agent stack to get the observability and memory capture without the skill catalog.

I have not yet seen anyone fork at any of the three layers, since DOS itself is not yet public. But the design supports it — and the three answers together are the working definition of "what DOS is, layered."

Why this layered model justifies itself

The three layers are not just architectural neatness. They reflect three different change rates. Skills change weekly (operator iteration). Packs change monthly (capability evolution). Hooks change quarterly (system-level adjustments). Layering by change rate is what keeps each layer's ceremony appropriate to how often the layer is touched. If I get the change-rate alignment wrong, the rest of the model breaks — heavy ceremony at high change rates produces friction; light ceremony at low change rates produces drift.

The three-layer model is the clearest summary I have for what DOS should be, structurally. Skills are how operators do work. Packs are how capabilities accumulate. Hooks are how the system stays observant of itself. None of the three could be removed without making DOS materially less useful; none of them is doing the job another layer is supposed to do.

If I had to pick the most underrated of the three, it would be hooks. The hook layer is invisible by design — it does its work without being asked — and exactly because it is invisible, it is the part operators rarely notice and the part I would most miss if it were gone. The most invisible work is sometimes the most important. That is true of hooks. It is true of operating systems generally.

The retrospective on this layered model ships at the end of Q2, when the catalog rationalization is supposed to be done and the numbers in the table above will either match the targets or tell me something I did not expect about which layer compounded faster than the others. Either result is more useful than continuing to operate on a catalog whose shape I have not committed to in writing.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta