27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Apr 7, 2026

Forking MemPalace: The 48-Hour Integration Retro

MemPalace shipped open-source on April 5. By Tuesday morning I had forked it into DOS. This is the integration retrospective written within forty-eight hours of the upstream launch — what I kept from my December sketch, what I had to abandon, what surprised me about the upstream's design choices, and what the operator-facing behavior actually looks like once a real implementation runs in production. Eric Evans on whether Wings/Rooms/Drawers map to Bounded Contexts (mostly yes). Greg Young on whether verbatim append-only storage is event-sourcing in the strict sense (mostly yes, with one important caveat I did not anticipate).

I am writing this on a Tuesday morning, exactly forty-eight hours after MemPalace shipped open-source. By Sunday night I had forked it into DOS. By Monday afternoon I had wired the capture path through DOS's hooks and migrated the prior fourteen weeks of operator context I had been hand-keeping in MEMORY/ into the new structure. Tuesday morning is the right moment to write the integration retrospective — the lessons are fresh, the surprises are still surprising, and the version I would write in two weeks would have already smoothed over the texture I want preserved on the record.

The W9 essay from December sketched a from-scratch design for what I wanted DOS's memory layer to be. The editor's note at the top of W9 (added in May after the upstream landed) tells the reader I forked the actual project rather than building my own. This essay is the concrete fork retrospective — how the December sketch held up against a real implementation, what I had to revise, and what I am genuinely glad I waited for.

I am twenty-six weeks past the W9 essay. The forty-eight hours of integration represents roughly the same fraction of one week as the W9 essay was of a full quarter of design thinking. The compression is the lesson: when the right upstream lands, the integration is fast precisely because the design thinking was done in advance.

The fork in one sentence

The upstream's Wings/Rooms/Drawers structure is approximately my W9 channels-and-triples structure, executed better than I would have executed it. I kept the upstream's architecture wholesale and adapted only the surface that touches DOS-specific hooks. Forty-eight hours from clone to production-ready, including data migration.

The Eric Evans angle: Wings as Aggregates, Rooms as sub-aggregates

The upstream's three-tier structure — Wings at the top, Rooms inside Wings, Drawers at the leaves — maps almost directly onto Eric Evans's Bounded Context and Aggregate concepts.

A Wing is a Bounded Context. The Wing for "Acme" (a client) holds everything the operator knows about Acme: people, decisions, milestones, relationships. The Wing for the DOS project itself holds everything about DOS as a domain — design decisions, failure patterns, architectural commitments, the changelog. Each Wing has its own internal vocabulary; cross-Wing references are URN-based, exactly as the W9 sketch proposed for cross-channel references.

A Room is an Aggregate inside the Wing. The "Conversations with Marina" room inside the Acme Wing is one Aggregate. The "April-architecture-decisions" room inside the DOS Wing is another. Rooms have internal coherence; Wings have cross-Room reference rules.

A Drawer is the leaf-level artifact — the actual verbatim content. A drawer holds the original conversation, document, decision-record, or session log. Nothing is summarized. Nothing is paraphrased. The literal content is preserved for retrieval.

The mapping to my W9 channels was rough but the shape was right. My W9 sketch had six channels (intent, entity, decision, milestone, agent_diary, relationship). The upstream's structure does not have flat-list channels; it has hierarchical Wings-and-Rooms. The upstream is more Evans-aligned than my sketch was — the Wing-as-Bounded-Context is closer to Evans's intent than my channel-as-aggregate framing was.

Reading the upstream's mempalace/backends/base.py on Sunday night, I had the same feeling I had in the W21 MCP-in-production essay when the protocol matched the hexagonal port-and-adapter pattern almost exactly. The right abstraction had already been worked out. What I had to do was learn its vocabulary.

The Greg Young angle: verbatim storage as event sourcing

The upstream's most distinctive architectural commitment — verbatim storage — is event sourcing in the strict Greg Young sense, applied to operator memory.

Most memory layers I have looked at do summarization: take a long conversation, distill it into key facts, store the distillate. The upstream refuses this. The original content is stored verbatim. Retrieval surfaces the original text; the retrieval layer may rank-and-prioritize, but the storage layer holds the raw input. Nothing is paraphrased into a different shape.

That is exactly Young's framing applied one layer up. Young argues that the current state should be a projection over an append-only event stream, not a stored thing in itself. The upstream argues that the current memory should be a retrieval over an append-only verbatim corpus, not a summarized digest. Same pattern. Different domain.

My W9 sketch had moved partway in this direction with the superseded_by field on each triple — preserving history while computing current state. The upstream went further. It did not even paraphrase facts into triples. It kept the original text. The storage is the source-of-truth; the structure (Wings/Rooms/Drawers) is the index over the storage, not the storage itself.

The caveat I did not anticipate: this design has dramatic implications for retrieval cost. A semantic search over a corpus of verbatim conversations is more expensive at retrieval time than a triple-pattern lookup over a structured graph. The upstream's benchmark (96.6% R@5 raw on LongMemEval, zero API calls) addresses this — the retrieval is cheap because it is local — but it shifts the cost from write-time to read-time. My W9 sketch had implicitly assumed cheap-write / cheap-read; the upstream commits to expensive-but-local-read in exchange for zero loss. That tradeoff is exactly Greg Young's argument from CQRS Documents 2010, applied a layer up: the projection is computed on demand, the source is preserved entire.

The 48-hour integration timeline

Hour	Action
Sun Apr 5, 18:00	MemPalace upstream lands. I read the README, the SKILL parallels, the `mempalace/backends/base.py` interface
Sun 19:00	Decision: fork instead of finishing my own. The upstream's Wings/Rooms/Drawers is closer to my Evans-driven intent than my channels sketch was
Sun 20:00	`git clone`, `pip install -e .`, run the example commands
Sun 22:00	First Wing created for DOS itself. First Room: "Architectural decisions." First Drawer: the verbatim text of the four-copies-one-truth essay from W5
Mon Apr 6, 09:00	Migration script: convert the prior fourteen weeks of `MEMORY/WORK/` and `MEMORY/LEARNING/` directories into Drawers under DOS-Wing/Rooms
Mon 12:00	First Wing for Altyaa. Second Wing for Studio. Third for Donne (the CRM project)
Mon 14:00	Hook integration: PostToolUse / UserPromptSubmit / SessionEnd hooks now route capture to MemPalace via `mempalace mine --mode convos` per the upstream CLI surface
Mon 16:00	First end-to-end test: ask the agent a question that should retrieve from a Drawer that was migrated, not from anything in the current session. The agent finds the answer.
Mon 19:00	Eleven Wings populated, one per project in the DOS registry. Per-project Rooms inside each
Tue Apr 7, 06:00	First "wake up" run — the upstream's CLI command that loads the Wing's most-relevant Rooms into the agent's startup context. Banner cleaner than my W9 sketch ever produced
Tue 08:00	Writing this essay

Forty-eight hours from clone to production-ready integration with prior context migrated. The compression was possible because the upstream's design choices were close enough to my W9 sketch that I did not have to redesign anything. I had to learn its vocabulary and its CLI surface. That was almost the entire integration cost.

What I kept from the W9 sketch

Two pieces from my December design survived the integration intact.

One. The hook-driven capture pattern. My W9 sketch had PostToolUse and UserPromptSubmit hooks invoking a capture worker between turns. The upstream's CLI surface (mempalace mine) maps cleanly onto this — DOS hooks now invoke mempalace mine with the right --wing argument scoped to the active project. The pattern survived because the upstream's CLI was designed with exactly this kind of integration in mind.

Two. The cost discipline. My W9 callout said "cheap enough to leave running forever." The upstream's implementation honors this — local ChromaDB, zero API calls in the default configuration, all storage and retrieval on-machine. The pattern survived because the upstream's strongest commitment is local-first.

What I had to abandon from the W9 sketch

Three pieces from my December design did not survive.

Three pieces of the W9 sketch I had to drop

The flat six-channel structure. My W9 essay listed six channels (intent, entity, decision, milestone, agent_diary, relationship) as flat siblings. The upstream's hierarchical Wings/Rooms structure subsumes the same data more cleanly — channels become per-Wing Rooms with names that reflect the Wing's domain rather than a global taxonomy. "Decisions about Acme" lives inside the Acme Wing as a Room, not inside a global "decision" channel. This is more Evans-correct than my flat list was.
The triple schema. My W9 essay had a SQLite triple-store schema with subject/predicate/object columns. The upstream stores verbatim text plus metadata, with semantic search as the retrieval mechanism. The triple-store would have lost information; the verbatim store preserves it. I dropped the triple schema entirely.
The confidence-decay heuristic. My W9 essay had a confidence score that decayed linearly with age, hand-tuned per channel. The upstream does not have this — it ranks retrievals by semantic relevance, with recency as a sortable side-channel rather than a decaying score. The difference is structural: my heuristic was predicting what would still be true; the upstream is finding what was actually said. The upstream's approach turns out to be more honest. I dropped the decay heuristic entirely.

The third drop is the one that taught me the most. My confidence-decay heuristic was a tactic for handling temporal validity ("this fact may not be true anymore"). The upstream's approach is a different framing entirely — we do not store predictions; we store records. If the operator wants to know what is true now, the agent can read the most recent records. The verbatim corpus is the truth-set; the agent's job is to read it correctly, not to inherit a stale guess from a decaying confidence score.

I had been thinking about the problem wrong for four months. Reading the upstream's mempalace/backends/base.py interface fixed it on the first read.

What surprised me about the upstream's design

Three surprises, in order of how much they changed the integration.

One. The ChromaDB-by-default choice. I had assumed any open-source memory layer would default to a custom storage engine. The upstream chose ChromaDB, declared the interface in mempalace/backends/base.py as pluggable, and explicitly invited alternative backends. The choice is right — ChromaDB is mature, the interface is small, and the operator gets battle-tested vector search on day one. I had been planning to write my own; I would have spent six weeks on something the upstream solves with one external dependency.

Two. The "wake-up" command. The upstream's CLI includes mempalace wake-up, which loads the most-relevant Rooms for the current session's apparent topic. I had not designed anything analogous in W9 — my sketch had operator-driven kg_query calls but no automatic context loading. The wake-up command is the missing primitive. It is the exact equivalent of the Sentinel session-start banner from W23, applied to memory. I should have predicted this primitive existed; I did not.

Three. The benchmark transparency. The upstream publishes 96.6% R@5 raw on LongMemEval as a reproducible number, with the commands to reproduce in benchmarks/BENCHMARKS.md. I had not committed to any retrieval-quality target in my W9 sketch. The upstream's discipline of reproducible benchmarks per release is the kind of practice I am going to copy across DOS's other packs — Sentinel especially. The benchmark itself is the evidence the operator uses to trust the integration; without it, "this works well" is a marketing claim. With it, the operator can run the test themselves.

What the operator-facing behavior actually looks like

The first session after the integration was the moment I knew the fork was the right call.

I asked the DOS agent a question I had asked five weeks ago — a question whose answer was in MEMORY/WORK/20260301-some-architectural-decision/. The agent had not been in that session. The conversation had ended. Five weeks of context had passed.

The agent answered correctly. It cited the original Drawer. It surfaced two related Rooms I had not asked for but were relevant. The retrieval took roughly 200ms on my laptop with no API calls.

That is what the W9 essay was promising. It is what every memory-layer pitch promises. The difference between a pitch and a working implementation is the moment the agent retrieves something the operator forgot they had captured. That moment happened on Monday afternoon, forty-eight hours after the upstream landed.

Why the fork compresses my back-half schedule

Building the W9-sketch implementation from scratch would have taken six to ten weeks of focused work. The fork-and-integrate path took forty-eight hours. The five-to-eight weeks of saved time go directly into the back-half work — Sentinel hardening, Studio routing-by-sovereignty implementation, the founding-customer hunt I am still running. Forking the right upstream is the highest-leverage decision an indie founder makes in any quarter. The discipline of writing the sketch in advance is what made me ready to recognize the right upstream in the first place.

The W9 sketch

The from-scratch design this fork tested.

Sentinel — the other pack

Same Pack pattern, different domain.

Where MemPalace sits

Pack-as-Bounded-Context, hooks driving capture.

The substrate

Where MemPalace inference (when needed) gets metered.

The W9 essay was a sketch. The upstream's design is the implementation. The fact that they pointed in approximately the same direction is half luck and half the discipline of reading the right authors before sketching — Evans for Bounded Contexts, Young for event-sourced storage. The W9 sketch was wrong in the details and right in the shape. The upstream is right in both.

I am writing this with a freshly-populated MemPalace instance running underneath the very session that produced the essay. The agent has a working theory of DOS, of Studio, of Altyaa, of every project in the registry. The session-start banner is cleaner than I ever produced by hand. The compounding starts now.

Forty-eight hours. Eleven Wings. Hundreds of Drawers migrated. One promise I had been making to myself for four months, kept on a Sunday night and operational by Tuesday morning. Sometimes the right move is to wait for someone else to ship the right shape, recognize it when it lands, and have the design discipline to integrate it cleanly inside two days.

I will take that trade every time.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat