27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Feb 20, 2026

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

On Tuesday Anthropic shipped Claude Sonnet 4.6 with roughly 70% coding preference vs 4.5 — and put MCP connectors, Skills, and compaction on the free tier alongside it. Thursday brought Google's Gemini 3.1 Pro at ARC-AGI-2 77.1%. Friday brought Anthropic's Claude Code Security in limited preview, with Opus 4.6 finding 500+ previously undetected bugs in production OSS. xAI's Grok 4.20 Beta landed mid-week with weekly weight refreshes and four-agent collaboration. The week was not about raw IQ — it was about agentic primitives moving from paywalled to free, vertical reasoners eating their first SAST category, and the indie tooling moat narrowing in real time.

I expected this week to be a relative breather after the bifurcation week. Anthropic just closed $30B at $380B, the open-weights tier had its biggest week of the year, and the calendar suggested the labs would coast on capacity for a few days.

They did not coast. The labs raced on a different axis instead. Sonnet 4.6 landed Tuesday with 70% coding preference vs 4.5 — and shipped MCP connectors, Skills, and compaction onto the free tier in the same release. Google answered Thursday with Gemini 3.1 Pro at ARC-AGI-2 77.1%, more than double the prior generation. Friday brought Claude Code Security in limited research preview, a reasoning-over-codebase tool that found five hundred-plus previously undetected bugs in production open source by reading the code instead of pattern-matching it. xAI shipped Grok 4.20 Beta with weekly weight refreshes and four-agent collaboration mid-week.

Five separate releases. None of them about raw IQ. All of them about agentic primitives — the connectors, the Skills, the security reasoning, the multi-agent orchestration — moving from paywalled or limited to default and free. The week's signal is the moat shape: the labs are commoditizing the primitives indie tooling has been charging for since Q4.

I am writing this on a Friday twenty-three weeks into building DOS, six weeks into the hook-pipeline refactoring sequence I described Tuesday, and the way I think about Studio's positioning shifts again — for the fourth week in a row.

The week's signal in one sentence

The agentic primitives indie tooling has been monetizing for two quarters just moved to the free tier of the substrate. The pitch shifts from "we give you the agent capability" to "we give you the operator layer the substrate doesn't ship," and any indie product whose moat was a paywall over those primitives needs to repitch this month.

The hook: Sonnet 4.6 and the free-tier expansion

The single most consequential thing of the five-day window landed Tuesday morning, in two parts that compound when read together.

On Tue Feb 17, Anthropic shipped Claude Sonnet 4.6 (source). Roughly 70% coding preference vs Sonnet 4.5 in side-by-side evaluations, 59% vs Opus 4.5, identical $3-input / $15-output pricing, 1M-token context in beta. Default for both free and paid plans.

The same day, Anthropic landed MCP connectors, Skills, and compaction on the Claude Code free tier (source). Connectors that were paywalled until this week now ship to free users by default. Skills — the open-spec capability composition format Anthropic launched in December — now run on the free tier. Conversation compaction, previously a paid-tier optimization, runs on free too.

Read the two announcements together. Sonnet 4.6 is the model improvement; the free-tier feature drop is the strategic move. Anthropic just removed the paywall on the agentic primitives that indie tooling has been monetizing since the Skills directory launched in December.

For an indie founder, the implication is sharp. Three months ago, "we give you Claude with MCP connectors and Skills" was a credible product wedge. As of Tuesday, that capability is the default of the free tier. The wedge is gone. What remains, what indie tooling can still credibly sell, is the operator layer the substrate does not ship — context engineering, memory, convention catalogs, the failure-pattern discipline, the cross-vendor routing infrastructure. That is the work above the substrate that the substrate cannot directly commoditize.

This is the strategic refinement Studio's pitch needs this month. Not "we give you Claude." Not "we wrap MCP for you." But: "we give you the operator loop Claude does not ship, on top of whichever substrate you route to."

Code Security and the SAST canary

The other Anthropic release this week is the canary for every vertical-AI indie still selling.

On Fri Feb 20, Anthropic launched Claude Code Security in limited research preview (source). The pitch: an agent that reasons over code the way a human researcher does, found 500+ previously undetected bugs in production open source using Opus 4.6, with expedited access for OSS maintainers. The product category, in traditional vocabulary, is SAST — Static Application Security Testing. Snyk's category. Veracode's category. Checkmarx's category. The category that has been a paid SaaS line for fifteen years.

Anthropic just ate it from above. Not by pattern-matching better. By reasoning over the codebase with a frontier model. The model already exists. The product wrapping is the limited preview. The category is now substrate-commoditizable.

For an indie founder building any vertical AI tool whose moat is "a rules engine over text," Code Security is the canary. Anthropic-flavored reasoners are going to ship into your vertical inside ninety days. The defensible work is not the rules engine; it is the layer above the reasoner — the workflow, the evidence trail, the accountability framework, the human-in-the-loop UX, the integration with the customer's existing process. Build for that layer. Assume the reasoner is now substrate.

This is the same lesson the Anthropic Cowork launch in mid-January was teaching for knowledge work. Code Security is the same play in security. The pattern is going to repeat across legal, sales, customer support, finance — every vertical where reasoning beats pattern-matching, the substrate is going to ship a "Claude Code for X" preview, and the indie tooling that survives is the one that pre-positioned above the reasoner instead of next to it.

The Google and xAI motions

Two more items rounded out the week, each with their own signal.

Thu Feb 19 — Google Gemini 3.1 Pro ARC-AGI-2 score 77.1% (source). More than 2x Gemini 3 Pro on the same benchmark. Available via the Gemini API, AI Studio, the Gemini CLI, Antigravity, Android Studio, and Vertex AI. The interesting detail is the surface coverage — Gemini ships across six different developer touchpoints simultaneously. Google is competing on distribution surface area, not benchmarks alone.
Tue Feb 17 — xAI ships Grok 4.20 Beta (source). Rapid-learning architecture with weekly weight refreshes from real-world feedback, 256k+ context, four-agent collaboration mode. Free at grok.com or paid SuperGrok. The four-agent collaboration is the structural notable — every credible coding-agent surface (Claude Code Agent Teams, Cursor, now Grok 4.20) ships multi-agent orchestration as a default by Q1 2026.

Read the two together. Google competing on surface area. xAI competing on cadence (weekly weight refreshes is a meaningful structural commitment). Both narrowing the indie moat in their own direction. Multi-agent orchestration, which was indie-wrapper territory in October, is now a model-default feature on three of the four major coding surfaces.

The pattern: primitives commoditize, layer above is the moat

The five-day window in one frame

The 'agentic primitives are the moat' frame (closed this week)

MCP connectors as paywalled differentiation
Skills as paid-tier feature
Multi-agent orchestration as indie wrapper territory
SAST-as-rules-engine as defensible vertical
Compaction / context management as paid optimization

The 'layer above the primitives is the moat' frame (this week's evidence)

MCP connectors free on Claude Code
Skills free on Claude Code
Multi-agent orchestration default on Claude Code, Cursor, Grok 4.20
SAST-as-reasoning being absorbed into Claude Code Security
Compaction free; context engineering and memory are now the indie territory

If you wanted evidence for the year's prevailing narrative — "the substrate is plural, the protocols are public goods, the workflow layer is the moat, the rules layer is being codified, the hyperscalers are vertically integrating, the flagship is agentic-coding-shaped, the substrate has bifurcated" — this week added the eighth refinement: agentic primitives are commoditizing into the free tier, and the moat moves up to context, memory, and operator-loop layers above. Indie tooling that priced on the primitives needs to reprice on the layer above by end of Q1.

Two angles for an indie founder

What an indie founder building on the substrate should do this week

Repitch your product above the substrate primitives. If your wedge has been "we wrap MCP for you" or "we ship Skills for your team," that wedge closed Tuesday. The free tier now ships those capabilities by default. The defensible repitch: "we give you the operator loop the substrate doesn't ship — context engineering, memory, conventions, failure pipelines, multi-vendor routing." That is the work above the primitives, and it is exactly what the labs cannot commoditize because it is per-operator, not per-primitive. Studio's pitch is being rewritten this weekend.
Pre-position above any reasoner the labs will eventually ship. Code Security is the canary. Every vertical AI tool whose moat is a rules engine should assume an Anthropic-flavored reasoner ships into that vertical inside ninety days. Build for the layer above the reasoner: workflow, evidence, accountability, human-in-the-loop UX, integration with the customer's existing process. The reasoner is becoming substrate. The wrapping is becoming the product.
Assume multi-agent is table stakes by Q2. Three coding surfaces — Claude Code Agent Teams, Cursor, Grok 4.20 — now ship multi-agent orchestration by default. By Q2 expect Aider, Cline, Antigravity, and Vibe CLI to follow. If your product still differentiates on "we orchestrate sub-agents," reposition before the differentiation evaporates. The defensible orchestration is over your specific domain context, not orchestration in the abstract.

What this changes for DOS

Two design decisions hardened this week, both of them coming out of the primitives-commoditize framing rather than any single news item.

One. Studio's pitch is being rewritten this weekend. The "credit-metered gateway over multiple substrates" technical pitch holds. The strategic pitch — "we give you Claude with MCP and Skills" — does not, after Tuesday. The replacement: "we give you the operator loop Claude does not ship, with multi-vendor routing underneath, on top of whichever substrate you choose." Same product. Sharper story. The story matters because it determines what alternatives the operator evaluates Studio against — and after this week, those alternatives include Claude Code's free tier, which is no longer a strawman.

Two. The Sentinel pack — sketched in Week 8 — moves up the build queue. Sentinel's core proposition is convention catalogs the agent reads at session start. That is exactly the operator-loop work above the primitives that this week's framing makes load-bearing. The free-tier expansion is the moment Sentinel stops being "a nice convention tool" and becomes "the differentiator that justifies Studio's existence above Claude Code free." Shipping it as a public Skill before the end of February is now a competitive necessity, not a roadmap nicety.

That is the kind of forcing function the primitives-commoditize week is structurally designed to produce. Studio's strategic pitch shifts in response, and the work that was "next quarter" becomes "this month."

What I am watching for next week

The thread that runs through the week

Agentic primitives are commoditizing into the free tier. SAST is the first vertical absorbed by reasoning-over-codebase. Multi-agent orchestration is now default across three coding surfaces. The labs raced on primitives, not on raw IQ. The indie moat moves above the primitives — to context, memory, conventions, operator-loop infrastructure that the substrate does not ship.

For an indie founder, the playbook reduces. Repitch above the primitives. Pre-position above any reasoner the labs will eventually ship. Treat multi-agent as table stakes. Bet the moat on the operator loop on top, because that is the only piece of the stack the labs cannot vertically commoditize, since it is per-operator and not per-primitive.

That is what Studio is becoming. The Sonnet window made the strategic pitch sharper, and the Sentinel pack moves up the queue accordingly.

— Lucas

Sources verified the week of Feb 16-20, 2026: Anthropic Claude Sonnet 4.6 (Feb 17) · Claude Code release notes — free-tier MCP/Skills (Feb 17) · Google Gemini 3.1 Pro (Feb 19) · Anthropic Claude Code Security (Feb 20) · xAI Grok 4.20 Beta (Feb 17)

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat