27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Dec 23, 2025

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

I have not built the memory layer yet. What I have, fifteen weeks into DOS, is the smell — every session opens with the agent reading the same files it read yesterday because nothing accumulates between turns. This essay is the architecture I was designing in December 2025, before the upstream project I would eventually fork even existed. Eric Evans on partitioning the graph into Aggregates. Greg Young on treating every fact as an event. Both applied to the moment an operator's accumulated context has nowhere to go.

Editor's note (added May 2026)

This essay was written in December 2025 as a from-scratch sketch of the personal-AI memory layer I wanted DOS to have. At the time, the upstream project I would eventually fork — MemPalace, a local-first verbatim-storage memory system with a Wings / Rooms / Drawers structure and a 96.6% R@5 raw on LongMemEval — did not yet exist. MemPalace shipped open-source on April 5, 2026. When I read its design, the primitives matched what I had been sketching closely enough that I forked it instead of finishing my own implementation. I am the operator and integrator of MemPalace inside DOS — I am not the original author, and credit for the system itself belongs to the MemPalace project. The essay below stays as-is because it is the December 2025 design intuition that informed why the fork later made sense.

I am writing this on a Monday a few days before Christmas, fifteen weeks and roughly two hundred and ten commits into building DuranteOS. The memory layer does not exist. The SQLite database does not exist. The capture hooks do not exist. The bridge tools do not exist. The graph that should be holding the operator's accumulated context across sessions is not there.

What I have is the smell that tells me I need to build it. Every Monday morning I open a session, ask the agent to do something I asked it to do three weeks ago, and watch it read the same five files it read three weeks ago because nothing it learned three weeks ago survived the session boundary.

If memory is the moat — the thesis I argued in Memory Replaces Lock-In — then the structure that holds the memory matters. The strategy claim was that an operator's accumulated context is the durable asset. The implementation claim, the one this essay is about, is that this context has to live somewhere queryable, traversable, and cheap enough that I stop noticing it is there.

I am writing the design essay before the code on purpose. I have made this mistake too many times in my career: I let the smell go on too long, then I build the wrong shape of the cure under deadline pressure, then I live with the wrong shape for three years. I want to commit to the shape now, while I still have the patience to be careful about its boundaries.

What I want the memory layer to be, in one sentence

A SQLite database at ~/.claude/MEMORY/knowledge_graph.sqlite3. Six channels. Append-only triples with confidence and timestamps. Hook-driven capture. Bridge-tool queries. No server, no network, no daemons. Cheap enough to leave running forever.

The Eric Evans angle: Aggregates are the unit of consistency

Eric Evans's Aggregate concept — refined later by Vaughn Vernon's Effective Aggregate Design — argues that the unit of transactional consistency is a small cluster of related entities, not the whole domain graph. You should be able to load an aggregate, modify it, and save it without traversing the rest of the world. The temptation in any graph design is to treat the whole graph as the consistency boundary. The discipline Evans imposes is the opposite: pick the smallest thing that has to be internally consistent, and let the rest of the graph reach it through references.

I want the memory layer to borrow this directly. Not one global blob. Partitioned into channels, each of which is an Evans-style Aggregate — a small consistent cluster that can be queried and updated independently.

Channel	What I want it to capture	Aggregate boundary
`intent`	Operator's stated goals, in-flight requests, asks	One aggregate per active intent; sealed when intent completes
`entity`	People, companies, projects, products	One aggregate per entity; cross-references by URN
`decision`	Architectural and product decisions with rationale	One aggregate per decision; immutable once recorded
`milestone`	Time-stamped events worth remembering	One aggregate per milestone; never modified
`agent_diary`	Reflective notes from agent runs	One aggregate per session
`relationship`	Notes on people the operator knows	One aggregate per person; updated on each interaction

The channel partitioning is what should keep queries fast. A "what do I know about Acme?" query touches the entity channel and maybe the decision channel — not the entire graph. The aggregate boundary is the implicit "stop here" signal for the traversal.

The piece I do not yet know how to design well is the cross-channel reference. If a relationship triple says "Marina is CTO at Acme," and an entity triple says "Acme is a client," a query about Marina has to hop from relationship into entity. Evans warns against letting cross-aggregate references turn into hidden joins that re-create the global blob. The right shape is probably URN-based references with explicit hop limits in the query language. I am writing this guess down so I can audit it once the system runs.

The Greg Young angle: every triple is an event

In a naive knowledge graph, triples are facts. "Acme is a client." "Marina is the CTO at Acme." If a fact changes, you update or replace the triple. The history is gone.

Greg Young's central move with CQRS and Event Sourcing is to refuse that update. The current state is not a thing you store; it is a thing you compute by left-folding the event stream. The store is append-only. The shape on disk is what happened, not what is true now.

I want the memory layer to be that, applied to the operator's context. Every triple is also an event — it has a timestamp, a confidence score, and a source (the session ID that produced it). Triples are append-only. When a fact changes, the new triple supersedes the old one without deleting it. The current state is a left-fold over the active triples.

This matters because almost all operator-relevant facts have temporal validity. "Marina is the CTO at Acme" might be true today and not true in nine months. The question "who runs engineering at Acme right now?" requires both the current fact AND the timestamp, because the agent has to decide whether to confirm with the operator before acting on it. A 90-day-old fact gets a re-verification prompt. A 7-day-old fact is treated as fresh. A 400-day-old fact is treated as a guess.

I cannot get that behavior from a fact store. I can only get it from an event store with a current-state projection on top.

Rendering diagram…

The capture path on the left should run synchronously after every tool call. The query path on the right should run on demand when the agent invokes the kg_query bridge tool. Both touch the same SQLite file with no coordination — SQLite's WAL mode handles the concurrency. I have not yet had to defend this against a real workload. I will discover what breaks when I run it.

The triple schema I want to start with

The schema should be small enough to print on a single screen. If it is not, I have over-modeled.

CREATE TABLE triples (
  id          TEXT PRIMARY KEY,
  channel     TEXT NOT NULL,            -- intent | entity | decision | milestone | agent_diary | relationship
  subject     TEXT NOT NULL,            -- the URN of the thing being described
  predicate   TEXT NOT NULL,            -- the relationship or property
  object      TEXT NOT NULL,            -- the value or referent
  confidence  REAL NOT NULL DEFAULT 1.0,
  source      TEXT NOT NULL,            -- session_id that produced this triple
  created_at  TEXT NOT NULL DEFAULT (datetime('now')),
  superseded_by TEXT,                   -- another triple ID that replaces this one
  metadata    TEXT                      -- JSON blob for channel-specific extensions
);

CREATE INDEX idx_subject ON triples(subject, channel);
CREATE INDEX idx_predicate ON triples(predicate);
CREATE INDEX idx_created ON triples(created_at DESC);
CREATE INDEX idx_active ON triples(subject, predicate) WHERE superseded_by IS NULL;

That is the entire on-disk shape I want to commit to. Six required columns, two optional, four indexes. The superseded_by column is the event-sourcing concession — you can always reconstruct any historical state by filtering on created_at <= some_date, but the common-case query (current state) uses the partial index and stays fast.

The metadata JSON blob is the part I am most suspicious of. Greg Young is firm that event payloads should be flat and explicit. A JSON blob is the place future-me starts hiding context-specific schema drift. I am putting it in because I do not yet know which channels need richer payloads, and I do not want to lock the schema before I have a workload. I am also putting a note in this essay to revisit it after three months of usage and either delete the blob or promote its keys to columns.

What I think a query should look like

The bridge tool should expose three primitives:

Name	Type	Required	Default	Description
kg_query	(entity, depth=1, channel=*) → triple[]	no	—	Return all active triples about an entity, optionally traversing N hops through cross-references.
kg_add	(channel, subject, predicate, object, confidence?) → triple_id	no	—	Append a triple. Auto-supersedes any conflicting active triple by (subject, predicate) within the same channel.
kg_explore	(seed_entity, max_depth=3) → graph	no	—	Return a connected subgraph centered on an entity. Used by the operator to browse, not by the agent to query.

A query I imagine the agent running constantly once this exists:

kg_query("urn:client:acme", depth=2, channel=*)
→ [
    { channel: "entity",       subject: "urn:client:acme",  predicate: "is_a",       object: "client" },
    { channel: "entity",       subject: "urn:client:acme",  predicate: "located_in", object: "São Paulo" },
    { channel: "entity",       subject: "urn:client:acme",  predicate: "uses",       object: "DOS" },
    { channel: "milestone",    subject: "urn:client:acme",  predicate: "first_call", object: "2026-04-12" },
    { channel: "relationship", subject: "urn:person:marina", predicate: "role_at",   object: "urn:client:acme/CTO" }
  ]

Five triples returned, two channels traversed, one cross-reference followed (Acme → Marina). I do not yet know how long this query will take against a real graph. My bet is single-digit milliseconds for a graph under 10,000 triples, with the partial index doing most of the work. If it is slower, the indexes are wrong. If it is faster than I expect, I will have over-engineered.

The capture path is the part I am most worried about

The capture path is the part I expect to redesign at least three times. Three iterations I think I will go through, in order:

Three versions I expect to walk through

Best-guess evolution. Writing it down so I can compare against what actually happens.

V1: explicit operator commands. I type /remember "Marina is CTO at Acme". This will work for two weeks until I forget to do it 80% of the time. Capture rate will be low. Recall accuracy will be high (everything captured will be correct).
V2: agent-initiated capture after every response. The agent's response format includes a MEMORY: wrote=... line; if non-empty, those triples get captured. This will capture more but the agent will over-capture: every minor fact becomes a triple, the graph grows faster than it stays useful.
V3: hook-driven capture with channel routing. PostToolUse and UserPromptSubmit hooks invoke a capture worker that runs between turns. The worker reads the recent conversation, classifies any new facts by channel, and writes only triples with confidence above a per-channel threshold. This is what I want to land on. Capture rate target: ~85%. Graph grows steadily but not explosively.

V3 is what I am committing to in advance. The capture worker should be a small TypeScript module — under 400 lines — that runs an inexpensive classification call on each fired hook. The classification prompt should know about the six channels and refuse to write outside them. Cost discipline: the per-session inference cost has to stay under a few centavos, or I will turn it off.

What I expect to get wrong, named in advance

Three open problems, named honestly because pretending the system is finished before I have shipped it is dishonest.

What this implies if you build something similar

Two suggestions, both load-bearing, both written to the version of me that will read this in three months and want to skip them.

One: choose a partition before you write the schema. A flat triple store is simple and fast for small graphs and unworkable past about 10,000 triples without an index strategy. Partitioning by channel — Evans's Aggregate idea, applied at the data layer — gives you a query boundary that scales with the operator's vocabulary, not with the graph's size. The cut between channels is also the place you discover the operator's actual mental model, which is the part Evans cares about most.

Two: capture has to be agent-initiated, not operator-initiated. Operators forget. Agents do not, when properly prompted. Hook-driven capture with channel routing is the only pattern I can think of that achieves a high capture rate without manual operator overhead. Anything that requires me to type a /remember command will not be there in the moments I most need it.

The cost discipline I am committing to in advance

Whatever the capture worker costs in inference per session, it has to be small enough that I never think about turning it off. If a session day costs more than a coffee, the system has lost the cost argument. Cheap-by-default is a feature; expensive-but-occasionally-useful is a maintenance burden disguised as a feature.

The strategy

Why this layer is the moat.

What will run against it

The Algorithm should query the memory layer before every plan.

The other knowledge surface

What Sentinel will be to codebases, the memory layer will be to operators.

The substrate

Where the capture-worker inference will be metered.

The graph should be invisible most of the time. That is the point. Operators do not want to manage their memory layer; they want it to be there when they ask. The system should exist to be quietly correct often enough that I stop noticing it — and start noticing only when it is missing.

I have not yet run any version of this as the always-on memory layer. The graph does not yet exist. By the end of Q1 I want to be able to say I have run it for a quarter, lost zero triples, re-verified ageing facts, and queried it tens of thousands of times. By next Christmas I want to be able to say it has been the always-on memory layer for a year. (As the editor's note at the top records: by April 2026 the right shape of this layer turned out to be MemPalace, and the fork made sense the moment I read the upstream design.)

The unglamorous compounding is what I am betting on being the actual product.

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat