27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Dec 12, 2025

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

On Thursday OpenAI shipped GPT-5.2 — Anthropic's Opus 4.5 (Nov 24) and Google's Gemini 3 Pro (Nov 18) make three frontier flagships in 23 days. The same week, the Linux Foundation formally took stewardship of MCP, AGENTS.md, and goose under the new Agentic AI Foundation. Mistral shipped Devstral 2 + a Claude-Code-style CLI. Accenture put 30,000 trained pros behind Claude Code. The substrate is now plural and the protocols are now neutral. Here is what changes for an indie founder.

In twenty-three days, three frontier labs shipped flagship models. That is not an opinion piece — that is the calendar.

November 18: Google ships Gemini 3 Pro (and Antigravity)
November 24: Anthropic ships Claude Opus 4.5 (with a 67% input-token price cut)
December 11 (today): OpenAI ships GPT-5.2 — Instant, Thinking, and Pro tiers — claiming 55.6% on SWE-Bench Pro (new SOTA), 80% on SWE-bench Verified, hallucinations down 30% versus 5.1, with a new "xhigh" reasoning effort tier and response compaction for extended-context agents (OpenAI announcement, Simon Willison's analysis)

For an indie founder running a personal AI OS on Claude Code, the load-bearing fact is not the benchmark. It is the cadence. Three frontier flagships in 23 days means there will not be a single winner of 2026. There will be a rotation, and the architecture that survives is the one that routes between the rotation, not the one that hardcodes any single vendor's SDK.

That is the framing for everything else that happened this week.

The week's signal in one sentence

Stop picking a winner. Start picking a router. The lab that is SOTA today will not be SOTA in three weeks, and the architecture that depends on it will not survive the next launch.

The protocols just became neutral

The other big story landed Tuesday and got less coverage than it should have.

On Tuesday December 9, the Linux Foundation announced the formation of the Agentic AI Foundation (AAIF) (source). Founding platinum members: AWS, Anthropic, Google, Microsoft, OpenAI, Block, Bloomberg, Cloudflare. Initial protocols moved under foundation governance: MCP, AGENTS.md, and Block's goose agent runtime.

This is the agentic stack's "Apache moment." Three protocols that personal AI OS architectures already depend on (MCP for capability composition, AGENTS.md for project-level instructions, goose for an open agent runtime) just stopped being any single vendor's roadmap item and became neutral standards.

For DOS specifically — twelve weeks into being built around MCP, AGENTS.md, and Claude Code-shaped conventions — the AAIF news is the most important hedge of the year. Three months ago I was committing to Anthropic-shaped patterns with the implicit risk that "Anthropic could re-license MCP." That risk is now gone.

The other three things that mattered

Mistral ships Devstral 2 + Mistral Vibe CLI — Tue Dec 9 (source). Open-weights coding model (123B, MIT-modified license, 256K context, 72.2% SWE-bench Verified, free API for now), plus a Claude-Code-style terminal CLI. The open-source camp now has a credible vibe-coding daily driver — and an MIT-modified license that lets indie founders embed it without negotiation.
Accenture × Anthropic launch the Anthropic Business Group with ~30,000 trained pros — Tue Dec 9 (source). Claude Code becomes the enterprise SDLC default for one of the world's largest consultancies. The indie tooling stack is being industrialized for Fortune 500 deployments in real time, which means the protocols and patterns are about to harden much faster than they would have organically.
Simon Willison: "Useful patterns for building HTML tools" — Wed Dec 10 (source). The canonical senior-engineer essay of the week: a working catalog of design patterns from someone who has shipped 150 disposable Claude-built HTML tools. Required reading for anyone betting on personal-AI-OS workflow code — the patterns are concrete, the tools are inspectable, and the operating model (disposable, single-file, LLM-built) is exactly what a personal AI OS makes possible.

The pattern: substrate stabilizes while frontier models churn

Read the week as two parallel motions:

What happened in parallel this week

Above the substrate (the frontier-model layer)

OpenAI, Google, and Anthropic all ship flagship models in 23 days
SOTA crowns trade hands week-to-week
Benchmark numbers stop being decisive — labs trade them like merchandise
The right substrate for any specific task is now a per-task question

Below the substrate (the protocols and pattern layer)

MCP, AGENTS.md, goose move under Linux Foundation governance
Mistral ships open-weights coding model + CLI under MIT-modified license
Accenture industrializes Claude Code patterns for enterprise SDLC
Senior practitioners publish concrete reusable design patterns

The two motions are connected. The frontier-model churn is what forces the protocol-layer stabilization. If three labs are going to keep trading SOTA every three weeks, then the architecture that survives must depend on the protocols, not the models. AAIF formalizing MCP this week is the market saying out loud what the architecture has been forced to assume since at least October.

For an indie founder, both motions point in the same direction:

What an indie founder building on top of frontier models should do this week

Audit your code for vendor-specific assumptions. Anywhere you have hardcoded Anthropic's SDK, OpenAI's response shape, or Google's auth flow — that is a future migration. The migration is not a hypothetical anymore. Three flagship launches in 23 days means the question is not if you will swap providers, it is how often and how cheaply.
Commit to MCP for capability composition. The protocol is now under neutral governance. The vendors competing on the substrate are the same vendors stewarding the protocol. The agency cost just dropped to near zero.
Build the routing layer before you need it. A credit-metered gateway that abstracts provider behind one Bearer token (which is what I described in detail this Monday) is no longer a Phase 2 feature. It is the architectural primitive that makes the rest of the stack survive a 23-day frontier-model rotation.

The Mistral counterweight

The other under-covered story this week is Mistral's Devstral 2 launch and the Vibe CLI that ships with it.

Open-weights, MIT-modified license, 123B parameters, 256K context, 72.2% on SWE-bench Verified, free API for now. Plus a Claude-Code-style CLI for the terminal. That is, structurally, the open-source counter-positioning to Claude Code at a fraction of the inference cost.

The interesting pattern: open-weights labs are now shipping not just models but the harness that runs the models. DeepSeek shipped V3.2-Speciale last week free until Dec 15. Mistral ships Devstral 2 + Vibe CLI free this week. The open-source side is building the same vertical the proprietary labs are building, just with the cost structure inverted.

For DOS, this clarifies a planning question I had been deferring. Should the gateway route to open-weights providers from day one, or add them in Phase 2? After this week, the answer is unambiguous: day one. The cost asymmetry is too large to defer.

What I am watching for next week

What this changes for DOS

For the personal AI OS I am twelve weeks into building, this week made one architectural commitment unavoidable. The credit-metered gateway I designed in detail on Monday is no longer a "build in Q1" item. It is a "build in the next four weeks" item, because the substrate-portability story is now market-validated, not just architecturally desirable.

I am moving the gateway forward by another three weeks. By the end of January it should be in code. By mid-February it should be routing across at least three providers (Claude, Gemini, and one open-weights option — probably Devstral 2 or DeepSeek-V3.2-Speciale depending on which one has stickier early-adoption signals).

That is the kind of small re-prioritization that, accumulated weekly, eventually produces an architecture that survives the next frontier-model launch — and the one after that, and the one after that.

— Lucas

Sources verified the week of Dec 8-11, 2025: GPT-5.2 announcement (Dec 11) · Simon Willison on GPT-5.2 (Dec 11) · Linux Foundation announces AAIF (Dec 9) · Mistral Devstral 2 + Vibe CLI (Dec 9) · Accenture × Anthropic Business Group (Dec 9) · Simon Willison: Useful patterns for HTML tools (Dec 10)

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat