Pular para o conteúdo
DuranteDurante
ALL SYSTEMSGet Access

27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Forking MemPalace: The 48-Hour Integration Retro

MemPalace shipped open-source on April 5. By Tuesday morning I had forked it into DOS. This is the integration retrospective written within forty-eight hours of the upstream launch — what I kept from my December sketch, what I had to abandon, what surprised me about the upstream's design choices, and what the operator-facing behavior actually looks like once a real implementation runs in production. Eric Evans on whether Wings/Rooms/Drawers map to Bounded Contexts (mostly yes). Greg Young on whether verbatim append-only storage is event-sourcing in the strict sense (mostly yes, with one important caveat I did not anticipate).

I am writing this on a Tuesday morning, exactly forty-eight hours after MemPalace shipped open-source. By Sunday night I had forked it into DOS. By Monday afternoon I had wired the capture path through DOS's hooks and migrated the prior fourteen weeks of operator context I had been hand-keeping in MEMORY/ into the new structure. Tuesday morning is the right moment to write the integration retrospective — the lessons are fresh, the surprises are still surprising, and the version I would write in two weeks would have already smoothed over the texture I want preserved on the record.

The W9 essay from December sketched a from-scratch design for what I wanted DOS's memory layer to be. The editor's note at the top of W9 (added in May after the upstream landed) tells the reader I forked the actual project rather than building my own. This essay is the concrete fork retrospective — how the December sketch held up against a real implementation, what I had to revise, and what I am genuinely glad I waited for.

I am twenty-six weeks past the W9 essay. The forty-eight hours of integration represents roughly the same fraction of one week as the W9 essay was of a full quarter of design thinking. The compression is the lesson: when the right upstream lands, the integration is fast precisely because the design thinking was done in advance.

The fork in one sentence

The upstream's Wings/Rooms/Drawers structure is approximately my W9 channels-and-triples structure, executed better than I would have executed it. I kept the upstream's architecture wholesale and adapted only the surface that touches DOS-specific hooks. Forty-eight hours from clone to production-ready, including data migration.

The Eric Evans angle: Wings as Aggregates, Rooms as sub-aggregates

The upstream's three-tier structure — Wings at the top, Rooms inside Wings, Drawers at the leaves — maps almost directly onto Eric Evans's Bounded Context and Aggregate concepts.

A Wing is a Bounded Context. The Wing for "Acme" (a client) holds everything the operator knows about Acme: people, decisions, milestones, relationships. The Wing for the DOS project itself holds everything about DOS as a domain — design decisions, failure patterns, architectural commitments, the changelog. Each Wing has its own internal vocabulary; cross-Wing references are URN-based, exactly as the W9 sketch proposed for cross-channel references.

A Room is an Aggregate inside the Wing. The "Conversations with Marina" room inside the Acme Wing is one Aggregate. The "April-architecture-decisions" room inside the DOS Wing is another. Rooms have internal coherence; Wings have cross-Room reference rules.

A Drawer is the leaf-level artifact — the actual verbatim content. A drawer holds the original conversation, document, decision-record, or session log. Nothing is summarized. Nothing is paraphrased. The literal content is preserved for retrieval.

The mapping to my W9 channels was rough but the shape was right. My W9 sketch had six channels (intent, entity, decision, milestone, agent_diary, relationship). The upstream's structure does not have flat-list channels; it has hierarchical Wings-and-Rooms. The upstream is more Evans-aligned than my sketch was — the Wing-as-Bounded-Context is closer to Evans's intent than my channel-as-aggregate framing was.

Reading the upstream's mempalace/backends/base.py on Sunday night, I had the same feeling I had in the W21 MCP-in-production essay when the protocol matched the hexagonal port-and-adapter pattern almost exactly. The right abstraction had already been worked out. What I had to do was learn its vocabulary.

The Greg Young angle: verbatim storage as event sourcing

The upstream's most distinctive architectural commitment — verbatim storage — is event sourcing in the strict Greg Young sense, applied to operator memory.

Most memory layers I have looked at do summarization: take a long conversation, distill it into key facts, store the distillate. The upstream refuses this. The original content is stored verbatim. Retrieval surfaces the original text; the retrieval layer may rank-and-prioritize, but the storage layer holds the raw input. Nothing is paraphrased into a different shape.

That is exactly Young's framing applied one layer up. Young argues that the current state should be a projection over an append-only event stream, not a stored thing in itself. The upstream argues that the current memory should be a retrieval over an append-only verbatim corpus, not a summarized digest. Same pattern. Different domain.

My W9 sketch had moved partway in this direction with the superseded_by field on each triple — preserving history while computing current state. The upstream went further. It did not even paraphrase facts into triples. It kept the original text. The storage is the source-of-truth; the structure (Wings/Rooms/Drawers) is the index over the storage, not the storage itself.

The caveat I did not anticipate: this design has dramatic implications for retrieval cost. A semantic search over a corpus of verbatim conversations is more expensive at retrieval time than a triple-pattern lookup over a structured graph. The upstream's benchmark (96.6% R@5 raw on LongMemEval, zero API calls) addresses this — the retrieval is cheap because it is local — but it shifts the cost from write-time to read-time. My W9 sketch had implicitly assumed cheap-write / cheap-read; the upstream commits to expensive-but-local-read in exchange for zero loss. That tradeoff is exactly Greg Young's argument from CQRS Documents 2010, applied a layer up: the projection is computed on demand, the source is preserved entire.

The 48-hour integration timeline

HourAction
Sun Apr 5, 18:00MemPalace upstream lands. I read the README, the SKILL parallels, the mempalace/backends/base.py interface
Sun 19:00Decision: fork instead of finishing my own. The upstream's Wings/Rooms/Drawers is closer to my Evans-driven intent than my channels sketch was
Sun 20:00git clone, pip install -e ., run the example commands
Sun 22:00First Wing created for DOS itself. First Room: "Architectural decisions." First Drawer: the verbatim text of the four-copies-one-truth essay from W5
Mon Apr 6, 09:00Migration script: convert the prior fourteen weeks of MEMORY/WORK/ and MEMORY/LEARNING/ directories into Drawers under DOS-Wing/Rooms
Mon 12:00First Wing for Altyaa. Second Wing for Studio. Third for Donne (the CRM project)
Mon 14:00Hook integration: PostToolUse / UserPromptSubmit / SessionEnd hooks now route capture to MemPalace via mempalace mine --mode convos per the upstream CLI surface
Mon 16:00First end-to-end test: ask the agent a question that should retrieve from a Drawer that was migrated, not from anything in the current session. The agent finds the answer.
Mon 19:00Eleven Wings populated, one per project in the DOS registry. Per-project Rooms inside each
Tue Apr 7, 06:00First "wake up" run — the upstream's CLI command that loads the Wing's most-relevant Rooms into the agent's startup context. Banner cleaner than my W9 sketch ever produced
Tue 08:00Writing this essay

Forty-eight hours from clone to production-ready integration with prior context migrated. The compression was possible because the upstream's design choices were close enough to my W9 sketch that I did not have to redesign anything. I had to learn its vocabulary and its CLI surface. That was almost the entire integration cost.

What I kept from the W9 sketch

Two pieces from my December design survived the integration intact.

One. The hook-driven capture pattern. My W9 sketch had PostToolUse and UserPromptSubmit hooks invoking a capture worker between turns. The upstream's CLI surface (mempalace mine) maps cleanly onto this — DOS hooks now invoke mempalace mine with the right --wing argument scoped to the active project. The pattern survived because the upstream's CLI was designed with exactly this kind of integration in mind.

Two. The cost discipline. My W9 callout said "cheap enough to leave running forever." The upstream's implementation honors this — local ChromaDB, zero API calls in the default configuration, all storage and retrieval on-machine. The pattern survived because the upstream's strongest commitment is local-first.

What I had to abandon from the W9 sketch

Three pieces from my December design did not survive.

Three pieces of the W9 sketch I had to drop

  1. The flat six-channel structure. My W9 essay listed six channels (intent, entity, decision, milestone, agent_diary, relationship) as flat siblings. The upstream's hierarchical Wings/Rooms structure subsumes the same data more cleanly — channels become per-Wing Rooms with names that reflect the Wing's domain rather than a global taxonomy. "Decisions about Acme" lives inside the Acme Wing as a Room, not inside a global "decision" channel. This is more Evans-correct than my flat list was.
  2. The triple schema. My W9 essay had a SQLite triple-store schema with subject/predicate/object columns. The upstream stores verbatim text plus metadata, with semantic search as the retrieval mechanism. The triple-store would have lost information; the verbatim store preserves it. I dropped the triple schema entirely.
  3. The confidence-decay heuristic. My W9 essay had a confidence score that decayed linearly with age, hand-tuned per channel. The upstream does not have this — it ranks retrievals by semantic relevance, with recency as a sortable side-channel rather than a decaying score. The difference is structural: my heuristic was predicting what would still be true; the upstream is finding what was actually said. The upstream's approach turns out to be more honest. I dropped the decay heuristic entirely.

The third drop is the one that taught me the most. My confidence-decay heuristic was a tactic for handling temporal validity ("this fact may not be true anymore"). The upstream's approach is a different framing entirely — we do not store predictions; we store records. If the operator wants to know what is true now, the agent can read the most recent records. The verbatim corpus is the truth-set; the agent's job is to read it correctly, not to inherit a stale guess from a decaying confidence score.

I had been thinking about the problem wrong for four months. Reading the upstream's mempalace/backends/base.py interface fixed it on the first read.

What surprised me about the upstream's design

Three surprises, in order of how much they changed the integration.

One. The ChromaDB-by-default choice. I had assumed any open-source memory layer would default to a custom storage engine. The upstream chose ChromaDB, declared the interface in mempalace/backends/base.py as pluggable, and explicitly invited alternative backends. The choice is right — ChromaDB is mature, the interface is small, and the operator gets battle-tested vector search on day one. I had been planning to write my own; I would have spent six weeks on something the upstream solves with one external dependency.

Two. The "wake-up" command. The upstream's CLI includes mempalace wake-up, which loads the most-relevant Rooms for the current session's apparent topic. I had not designed anything analogous in W9 — my sketch had operator-driven kg_query calls but no automatic context loading. The wake-up command is the missing primitive. It is the exact equivalent of the Sentinel session-start banner from W23, applied to memory. I should have predicted this primitive existed; I did not.

Three. The benchmark transparency. The upstream publishes 96.6% R@5 raw on LongMemEval as a reproducible number, with the commands to reproduce in benchmarks/BENCHMARKS.md. I had not committed to any retrieval-quality target in my W9 sketch. The upstream's discipline of reproducible benchmarks per release is the kind of practice I am going to copy across DOS's other packs — Sentinel especially. The benchmark itself is the evidence the operator uses to trust the integration; without it, "this works well" is a marketing claim. With it, the operator can run the test themselves.

What the operator-facing behavior actually looks like

The first session after the integration was the moment I knew the fork was the right call.

I asked the DOS agent a question I had asked five weeks ago — a question whose answer was in MEMORY/WORK/20260301-some-architectural-decision/. The agent had not been in that session. The conversation had ended. Five weeks of context had passed.

The agent answered correctly. It cited the original Drawer. It surfaced two related Rooms I had not asked for but were relevant. The retrieval took roughly 200ms on my laptop with no API calls.

That is what the W9 essay was promising. It is what every memory-layer pitch promises. The difference between a pitch and a working implementation is the moment the agent retrieves something the operator forgot they had captured. That moment happened on Monday afternoon, forty-eight hours after the upstream landed.

Why the fork compresses my back-half schedule

Building the W9-sketch implementation from scratch would have taken six to ten weeks of focused work. The fork-and-integrate path took forty-eight hours. The five-to-eight weeks of saved time go directly into the back-half work — Sentinel hardening, Studio routing-by-sovereignty implementation, the founding-customer hunt I am still running. Forking the right upstream is the highest-leverage decision an indie founder makes in any quarter. The discipline of writing the sketch in advance is what made me ready to recognize the right upstream in the first place.

The W9 essay was a sketch. The upstream's design is the implementation. The fact that they pointed in approximately the same direction is half luck and half the discipline of reading the right authors before sketching — Evans for Bounded Contexts, Young for event-sourced storage. The W9 sketch was wrong in the details and right in the shape. The upstream is right in both.

I am writing this with a freshly-populated MemPalace instance running underneath the very session that produced the essay. The agent has a working theory of DOS, of Studio, of Altyaa, of every project in the registry. The session-start banner is cleaner than I ever produced by hand. The compounding starts now.

Forty-eight hours. Eleven Wings. Hundreds of Drawers migrated. One promise I had been making to myself for four months, kept on a Sunday night and operational by Tuesday morning. Sometimes the right move is to wait for someone else to ship the right shape, recognize it when it lands, and have the design discipline to integrate it cleanly inside two days.

I will take that trade every time.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta