Pular para o conteúdo
DuranteDurante
ALL SYSTEMSGet Access

27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

I have not built the memory layer yet. What I have, fifteen weeks into DOS, is the smell — every session opens with the agent reading the same files it read yesterday because nothing accumulates between turns. This essay is the architecture I was designing in December 2025, before the upstream project I would eventually fork even existed. Eric Evans on partitioning the graph into Aggregates. Greg Young on treating every fact as an event. Both applied to the moment an operator's accumulated context has nowhere to go.

Editor's note (added May 2026)

This essay was written in December 2025 as a from-scratch sketch of the personal-AI memory layer I wanted DOS to have. At the time, the upstream project I would eventually fork — MemPalace, a local-first verbatim-storage memory system with a Wings / Rooms / Drawers structure and a 96.6% R@5 raw on LongMemEval — did not yet exist. MemPalace shipped open-source on April 5, 2026. When I read its design, the primitives matched what I had been sketching closely enough that I forked it instead of finishing my own implementation. I am the operator and integrator of MemPalace inside DOS — I am not the original author, and credit for the system itself belongs to the MemPalace project. The essay below stays as-is because it is the December 2025 design intuition that informed why the fork later made sense.

I am writing this on a Monday a few days before Christmas, fifteen weeks and roughly two hundred and ten commits into building DuranteOS. The memory layer does not exist. The SQLite database does not exist. The capture hooks do not exist. The bridge tools do not exist. The graph that should be holding the operator's accumulated context across sessions is not there.

What I have is the smell that tells me I need to build it. Every Monday morning I open a session, ask the agent to do something I asked it to do three weeks ago, and watch it read the same five files it read three weeks ago because nothing it learned three weeks ago survived the session boundary.

If memory is the moat — the thesis I argued in Memory Replaces Lock-In — then the structure that holds the memory matters. The strategy claim was that an operator's accumulated context is the durable asset. The implementation claim, the one this essay is about, is that this context has to live somewhere queryable, traversable, and cheap enough that I stop noticing it is there.

I am writing the design essay before the code on purpose. I have made this mistake too many times in my career: I let the smell go on too long, then I build the wrong shape of the cure under deadline pressure, then I live with the wrong shape for three years. I want to commit to the shape now, while I still have the patience to be careful about its boundaries.

What I want the memory layer to be, in one sentence

A SQLite database at ~/.claude/MEMORY/knowledge_graph.sqlite3. Six channels. Append-only triples with confidence and timestamps. Hook-driven capture. Bridge-tool queries. No server, no network, no daemons. Cheap enough to leave running forever.

The Eric Evans angle: Aggregates are the unit of consistency

Eric Evans's Aggregate concept — refined later by Vaughn Vernon's Effective Aggregate Design — argues that the unit of transactional consistency is a small cluster of related entities, not the whole domain graph. You should be able to load an aggregate, modify it, and save it without traversing the rest of the world. The temptation in any graph design is to treat the whole graph as the consistency boundary. The discipline Evans imposes is the opposite: pick the smallest thing that has to be internally consistent, and let the rest of the graph reach it through references.

I want the memory layer to borrow this directly. Not one global blob. Partitioned into channels, each of which is an Evans-style Aggregate — a small consistent cluster that can be queried and updated independently.

ChannelWhat I want it to captureAggregate boundary
intentOperator's stated goals, in-flight requests, asksOne aggregate per active intent; sealed when intent completes
entityPeople, companies, projects, productsOne aggregate per entity; cross-references by URN
decisionArchitectural and product decisions with rationaleOne aggregate per decision; immutable once recorded
milestoneTime-stamped events worth rememberingOne aggregate per milestone; never modified
agent_diaryReflective notes from agent runsOne aggregate per session
relationshipNotes on people the operator knowsOne aggregate per person; updated on each interaction

The channel partitioning is what should keep queries fast. A "what do I know about Acme?" query touches the entity channel and maybe the decision channel — not the entire graph. The aggregate boundary is the implicit "stop here" signal for the traversal.

The piece I do not yet know how to design well is the cross-channel reference. If a relationship triple says "Marina is CTO at Acme," and an entity triple says "Acme is a client," a query about Marina has to hop from relationship into entity. Evans warns against letting cross-aggregate references turn into hidden joins that re-create the global blob. The right shape is probably URN-based references with explicit hop limits in the query language. I am writing this guess down so I can audit it once the system runs.

The Greg Young angle: every triple is an event

In a naive knowledge graph, triples are facts. "Acme is a client." "Marina is the CTO at Acme." If a fact changes, you update or replace the triple. The history is gone.

Greg Young's central move with CQRS and Event Sourcing is to refuse that update. The current state is not a thing you store; it is a thing you compute by left-folding the event stream. The store is append-only. The shape on disk is what happened, not what is true now.

I want the memory layer to be that, applied to the operator's context. Every triple is also an event — it has a timestamp, a confidence score, and a source (the session ID that produced it). Triples are append-only. When a fact changes, the new triple supersedes the old one without deleting it. The current state is a left-fold over the active triples.

This matters because almost all operator-relevant facts have temporal validity. "Marina is the CTO at Acme" might be true today and not true in nine months. The question "who runs engineering at Acme right now?" requires both the current fact AND the timestamp, because the agent has to decide whether to confirm with the operator before acting on it. A 90-day-old fact gets a re-verification prompt. A 7-day-old fact is treated as fresh. A 400-day-old fact is treated as a guess.

I cannot get that behavior from a fact store. I can only get it from an event store with a current-state projection on top.

Rendering diagram…

The capture path on the left should run synchronously after every tool call. The query path on the right should run on demand when the agent invokes the kg_query bridge tool. Both touch the same SQLite file with no coordination — SQLite's WAL mode handles the concurrency. I have not yet had to defend this against a real workload. I will discover what breaks when I run it.

The triple schema I want to start with

The schema should be small enough to print on a single screen. If it is not, I have over-modeled.

CREATE TABLE triples (
  id          TEXT PRIMARY KEY,
  channel     TEXT NOT NULL,            -- intent | entity | decision | milestone | agent_diary | relationship
  subject     TEXT NOT NULL,            -- the URN of the thing being described
  predicate   TEXT NOT NULL,            -- the relationship or property
  object      TEXT NOT NULL,            -- the value or referent
  confidence  REAL NOT NULL DEFAULT 1.0,
  source      TEXT NOT NULL,            -- session_id that produced this triple
  created_at  TEXT NOT NULL DEFAULT (datetime('now')),
  superseded_by TEXT,                   -- another triple ID that replaces this one
  metadata    TEXT                      -- JSON blob for channel-specific extensions
);

CREATE INDEX idx_subject ON triples(subject, channel);
CREATE INDEX idx_predicate ON triples(predicate);
CREATE INDEX idx_created ON triples(created_at DESC);
CREATE INDEX idx_active ON triples(subject, predicate) WHERE superseded_by IS NULL;

That is the entire on-disk shape I want to commit to. Six required columns, two optional, four indexes. The superseded_by column is the event-sourcing concession — you can always reconstruct any historical state by filtering on created_at <= some_date, but the common-case query (current state) uses the partial index and stays fast.

The metadata JSON blob is the part I am most suspicious of. Greg Young is firm that event payloads should be flat and explicit. A JSON blob is the place future-me starts hiding context-specific schema drift. I am putting it in because I do not yet know which channels need richer payloads, and I do not want to lock the schema before I have a workload. I am also putting a note in this essay to revisit it after three months of usage and either delete the blob or promote its keys to columns.

What I think a query should look like

The bridge tool should expose three primitives:

NameTypeRequiredDefaultDescription
kg_query(entity, depth=1, channel=*) → triple[]noReturn all active triples about an entity, optionally traversing N hops through cross-references.
kg_add(channel, subject, predicate, object, confidence?) → triple_idnoAppend a triple. Auto-supersedes any conflicting active triple by (subject, predicate) within the same channel.
kg_explore(seed_entity, max_depth=3) → graphnoReturn a connected subgraph centered on an entity. Used by the operator to browse, not by the agent to query.

A query I imagine the agent running constantly once this exists:

kg_query("urn:client:acme", depth=2, channel=*)
→ [
    { channel: "entity",       subject: "urn:client:acme",  predicate: "is_a",       object: "client" },
    { channel: "entity",       subject: "urn:client:acme",  predicate: "located_in", object: "São Paulo" },
    { channel: "entity",       subject: "urn:client:acme",  predicate: "uses",       object: "DOS" },
    { channel: "milestone",    subject: "urn:client:acme",  predicate: "first_call", object: "2026-04-12" },
    { channel: "relationship", subject: "urn:person:marina", predicate: "role_at",   object: "urn:client:acme/CTO" }
  ]

Five triples returned, two channels traversed, one cross-reference followed (Acme → Marina). I do not yet know how long this query will take against a real graph. My bet is single-digit milliseconds for a graph under 10,000 triples, with the partial index doing most of the work. If it is slower, the indexes are wrong. If it is faster than I expect, I will have over-engineered.

The capture path is the part I am most worried about

The capture path is the part I expect to redesign at least three times. Three iterations I think I will go through, in order:

Three versions I expect to walk through

Best-guess evolution. Writing it down so I can compare against what actually happens.

  1. V1: explicit operator commands. I type /remember "Marina is CTO at Acme". This will work for two weeks until I forget to do it 80% of the time. Capture rate will be low. Recall accuracy will be high (everything captured will be correct).
  2. V2: agent-initiated capture after every response. The agent's response format includes a MEMORY: wrote=... line; if non-empty, those triples get captured. This will capture more but the agent will over-capture: every minor fact becomes a triple, the graph grows faster than it stays useful.
  3. V3: hook-driven capture with channel routing. PostToolUse and UserPromptSubmit hooks invoke a capture worker that runs between turns. The worker reads the recent conversation, classifies any new facts by channel, and writes only triples with confidence above a per-channel threshold. This is what I want to land on. Capture rate target: ~85%. Graph grows steadily but not explosively.

V3 is what I am committing to in advance. The capture worker should be a small TypeScript module — under 400 lines — that runs an inexpensive classification call on each fired hook. The classification prompt should know about the six channels and refuse to write outside them. Cost discipline: the per-session inference cost has to stay under a few centavos, or I will turn it off.

What I expect to get wrong, named in advance

Three open problems, named honestly because pretending the system is finished before I have shipped it is dishonest.

What this implies if you build something similar

Two suggestions, both load-bearing, both written to the version of me that will read this in three months and want to skip them.

One: choose a partition before you write the schema. A flat triple store is simple and fast for small graphs and unworkable past about 10,000 triples without an index strategy. Partitioning by channel — Evans's Aggregate idea, applied at the data layer — gives you a query boundary that scales with the operator's vocabulary, not with the graph's size. The cut between channels is also the place you discover the operator's actual mental model, which is the part Evans cares about most.

Two: capture has to be agent-initiated, not operator-initiated. Operators forget. Agents do not, when properly prompted. Hook-driven capture with channel routing is the only pattern I can think of that achieves a high capture rate without manual operator overhead. Anything that requires me to type a /remember command will not be there in the moments I most need it.

The cost discipline I am committing to in advance

Whatever the capture worker costs in inference per session, it has to be small enough that I never think about turning it off. If a session day costs more than a coffee, the system has lost the cost argument. Cheap-by-default is a feature; expensive-but-occasionally-useful is a maintenance burden disguised as a feature.

The graph should be invisible most of the time. That is the point. Operators do not want to manage their memory layer; they want it to be there when they ask. The system should exist to be quietly correct often enough that I stop noticing it — and start noticing only when it is missing.

I have not yet run any version of this as the always-on memory layer. The graph does not yet exist. By the end of Q1 I want to be able to say I have run it for a quarter, lost zero triples, re-verified ageing facts, and queried it tens of thousands of times. By next Christmas I want to be able to say it has been the always-on memory layer for a year. (As the editor's note at the top records: by April 2026 the right shape of this layer turned out to be MemPalace, and the fork made sense the moment I read the upstream design.)

The unglamorous compounding is what I am betting on being the actual product.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta