27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Nov 7, 2025

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

On Thursday Nov 6, Moonshot AI shipped Kimi K2 Thinking — a 1-trillion-parameter open-weights model that beats GPT-5 and Claude Sonnet 4.5 on agentic benchmarks, trained for $4.6M, with weights live on Hugging Face by the same evening. The week before the frontier labs' November fireworks, the open ecosystem made a structural move. Here is what that means if you are betting on a personal AI OS that has to outlive whatever model ships next.

The biggest story in AI this week landed on a Thursday morning and most people did not notice.

On Thursday November 6, 2025, Moonshot AI released Kimi K2 Thinking — a 1-trillion-parameter open-weights "thinking-agent" model that beats GPT-5 and Claude Sonnet 4.5 on the most agentic benchmarks the field cares about (CNBC, VentureBeat, Hugging Face model card). The numbers Moonshot reports:

Benchmark	Kimi K2 Thinking	GPT-5	Claude Sonnet 4.5
BrowseComp (web-agent)	60.2%	54.9%	24.1%
Humanity's Last Exam	44.9%	41.7%	—
SWE-Bench Verified	71.3%	—	—

Reported training cost: $4.6 million. Weights are on Hugging Face, the API is live on Moonshot's platform, and AWS Bedrock listed it within days.

For an indie founder building a personal AI operating system on top of Claude Code — which is where I sit, eight weeks into a private DOS experiment — Kimi K2 Thinking is the week's seismic story. Not because anyone is going to immediately rebuild on it. Because it is the first credible open-weights model that natively interleaves thought and tool calls, which is exactly the substrate property a personal-AI-OS harness depends on. The frontier-capability tax just dropped, visibly, in public.

The week's signal in one sentence

The substrate-replacement bet just got a free option on a 10x cost reduction. If you are building a layer on top of frontier models, you should already be asking what your system does when the substrate underneath it is open-weights and locally-runnable.

What else moved this week

Three other items mattered, in descending order of strategic weight:

Apps in ChatGPT — Peloton + Tripadvisor go live (Thu Nov 6). OpenAI's Apps SDK keeps eating the consumer surface (OpenAI on X, Tom's Guide). ChatGPT is now an OS-shaped distribution channel where third-party apps run inside the chat surface, not around it.
Ai2 launches OlmoEarth (Tue Nov 4). An open multimodal foundation model and platform pretrained on ~10 TB of satellite + sensor data (BusinessWire, Ai2 blog). Domain-specific open foundation models are becoming a category, not a one-off.
OpenAI claims 1M business customers (Wed Nov 5). Self-reported as the "fastest-growing business platform in history" (Scalac recap). The number is OpenAI's own, but the velocity claim sets the competitive backdrop for every other story this week.

This was a platform-layer week, not a frontier-model week. The big proprietary launches everyone was expecting (Grok 4.1, Gemini 3, Opus 4.5, GPT-5.1) all came after Nov 6. What shipped Mon–Thu was the substrate: an open-weights thinking-agent model that closes the gap to GPT-5, a consumer agent surface that's becoming a de-facto OS, and an open domain-foundation-model template.

Two things this changes for anyone building on top of frontier models

If you are reading this from inside an indie-AI startup or a personal experiment like mine, the week clarified two things.

What changed

The pre-Nov-6 assumption

"Open-weights are 12-18 months behind frontier"
"Substrate-portability is a nice-to-have for someday"
"Anthropic / OpenAI / Google will own the agentic-tool-use story for years"
"Distribution is ours to win — model providers are platforms, not products"

The post-Nov-6 assumption

Open-weights are 4-8 weeks behind on agentic benchmarks; the gap will keep closing
Substrate-portability is a now-feature; if your architecture cannot swap the model in a weekend, you have a strategic hole
Anthropic's tool-use API is good but no longer a moat — the open-weights tier just shipped the same primitive
Distribution is contested: ChatGPT Apps SDK proves the model providers will pull third-party experiences into the chat surface

I am particularly interested in the second row. For DOS — which is the personal AI OS I have been sketching for months but have barely started building — the architectural commitment from day one needs to be: the substrate is replaceable. The agent has to behave the same way whether it is running on Claude Sonnet 4.5, Kimi K2 Thinking, or a model that has not been released yet. That is a much stricter design discipline than "build on Claude and figure out the rest later."

I am committing to it explicitly because Kimi K2 Thinking proved this week that the assumption it depends on — open-weights catch up faster than we expect — is now an observable fact, not a hopeful prediction.

The Apps-in-ChatGPT counterpoint

The other story worth sitting with is the OpenAI Apps SDK going live with Peloton and Tripadvisor as launch partners. This is the inverse of the Kimi K2 story.

Where Kimi argues "the substrate is yours, build whatever you want on it," the Apps SDK argues "the chat window is ours, build inside it on our terms."

Both are real. Both will have customers. The interesting question is: if you are building a personal AI OS, which of these worlds do you want to live in?

The two paths the week made visible

Substrate-portable, operator-owned. Build a harness that runs on whatever model is best/cheapest this week. Operator owns identity, memory, and capability composition. Vendor-neutral, infrastructure-light, but distribution is hard. This is the DOS bet.
App-inside-the-megachat. Build inside ChatGPT (or whoever wins the consumer agent surface). Distribution is solved on day one. The vendor controls everything else — pricing, model, surface area, deprecation calendar. This is the Peloton/Tripadvisor bet.

I think both will produce real businesses. I think only one of them produces the kind of business where the operator's accumulated context belongs to the operator instead of to the platform vendor — and that distinction is the entire reason DOS exists as a thesis worth testing.

What I am watching for next week

Three things, ordered by how likely I think they are:

The honest position

I am writing this from the position of someone who has not yet built the system this commentary is about. DOS is eight weeks old. There is no memory layer, no substrate-routing harness, no API gateway, no production deployment. There is a thesis and a notebook and a small private repo.

What this week's news did is make the thesis more pressure-tested. Substrate-portability is no longer a hopeful design choice — it is a forced one. Open-weights at frontier-adjacent quality means I cannot architect DOS around any single provider, even Anthropic, without leaving real value on the table.

That changes my next four weeks of work. The first version of the memory layer (which I am writing about in detail this Monday) needs to be designed against a substrate I treat as plural from day one, not retrofitted later when a second provider becomes interesting.

— Lucas

Sources verified the week of Nov 3-6, 2025: CNBC: Kimi K2 Thinking release · VentureBeat: K2 Thinking benchmarks · Hugging Face model card · OpenAI Apps SDK launch (Nov 6) · Tom's Guide: Peloton + Tripadvisor in ChatGPT · BusinessWire: Ai2 OlmoEarth (Nov 4) · Ai2 OlmoEarth blog · Scalac: Last month in AI · Simon Willison: Datasette 1.0a20

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat