27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Comece por aqui

Se você está pensando em escrever enquanto constrói

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

Se você está considerando o grupo fundador

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

Se você quer a espinha arquitetônica

The Decomposition Discipline I Am Trying to Codify Inside DOS

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Dec 9, 2025

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

I do not have a Studio yet. I do not have a credit ledger yet. I do not have twelve providers behind a Bearer token yet. What I have is a clear picture of the gateway I want to build in the next 4-8 weeks, and the conviction that committing to the architecture now — before there is anything to retrofit — is cheaper than committing to it later. Here is the design, with Fowler on the routing layer and Greg Young on the ledger.

I am writing this post about a system I have not yet built.

What I have, on December 8, 2025, twelve weeks into DOS: a private experiment running on my own machine, calling Anthropic's API directly. No gateway. No credit ledger. No multi-provider routing. One operator, one substrate, one billing relationship. It is the simplest possible shape, and it works because the operator and the customer are the same person.

What I can already see coming, in the next 4-8 weeks: I will need to call more than one provider. The substrate is plural — Opus 4.5, Gemini 3 Pro, GPT-5.x, the open-weights tier — and routing-by-task is becoming an architectural requirement, not a future feature. The moment I have a second provider, I have a per-provider auth surface, a per-provider rate limit, a per-provider pricing schema, and a per-provider failure mode. The moment I have an operator who is not me, I need spend caps, an audit trail, and a billing reconciliation story.

This post is the gateway I want to build before any of that lands. It is the second piece of the substrate-portability bet I made publicly last week. Writing the design before writing the code is what gives me a chance of getting the seams right on first draft.

The gateway in one sentence

One HTTP service that routes calls to N AI providers behind a single Bearer token, debiting an organization's credit pool atomically per request, with spend caps enforced before the call goes out and refunds applied if the call fails after debit.

The Fowler angle: this should be the API Gateway pattern, plus an Anti-Corruption Layer

Martin Fowler's Patterns of Enterprise Application Architecture (and later the microservices.io catalog) names two patterns that should do most of the work here.

The first is API Gateway — a single edge service that the client talks to, which fans out to multiple backend services. The classic justifications: client doesn't need to know which backend handles which call, the gateway can enforce cross-cutting concerns once (auth, rate limiting, logging) instead of replicating them in each backend, and the backends can evolve their interfaces without breaking clients.

The second is Anti-Corruption Layer — the gateway translates between the client's coherent vocabulary and the backends' inconsistent ones. OpenAI's chat completions API, Anthropic's Messages API, and Google Gemini's generateContent endpoint do roughly the same thing with three completely different schemas. The gateway should expose one schema and translate internally.

Here is the layer map I want to commit to:

Concern	Pattern	Where it should live
Routing to provider	API Gateway	`app/api/v1/{service}/{provider}/route.ts`
Translating provider quirks	Anti-Corruption Layer	`lib/providers/{provider}/adapter.ts`
Auth	Edge middleware	`middleware.ts` (Bearer token)
Rate limiting	Edge middleware	`lib/rate-limit/per-org.ts`
Metering / billing	Domain service	`lib/credits/ledger.ts`
Logging / audit	Cross-cutting	Structured logger + sync to durable store

The bundling — five concerns, one service — is deliberate. It is also, I am aware, the kind of decision a senior platform engineer would push back on. The standard advice is "split each concern into its own service." That advice is correct in the abstract and wrong in the concrete for a one-operator system that needs to ship in weeks, not quarters.

What I am committing to: bundle now, with each concern in its own named layer that has its own tests, so the eventual split is a deployment change, not a rewrite. Premature splitting is more expensive than late splitting. The seams have to be pre-cut; they do not have to be pre-split.

The route shape I want

The URL pattern should be deliberately the same for every provider:

POST /api/v1/{service}/{provider}/{operation}
Authorization: Bearer dos_org_xxxx
Idempotency-Key: <uuid>

service is one of chat, image, audio, video, embedding, tts, stt, research. provider is whatever provider I am routing to (anthropic, openai, google, replicate, elevenlabs, mistral, xai, perplexity, groq, deepseek, meta, cohere — twelve is the rough target inside six months). operation is the verb the provider uses (messages, completions, generate, etc.) — preserved from the upstream API so the request body can pass through with minimal translation.

This is the principle Fowler calls "honoring the underlying interface where you can." Do not invent a new schema for things that are already standard. Translate where providers diverge from the standard. Add cross-cutting concerns (idempotency, metering, auth) at the gateway layer.

The Greg Young angle: the credit ledger should be event-sourced

The ledger is the part I expect will take me longest to get right, and it is also the part where I want to lean hardest on Greg Young's work on event sourcing and CQRS.

A credit ledger looks like a balance. The naive implementation is a single column on the organizations table: credits_remaining: decimal. You debit it on each request. Done.

That implementation cannot survive contact with reality. Concurrent requests race the column. Failed provider calls leave you with debited-but-uncompensated state. Refunds require either a write to "amount" or a write to a separate "refunds" table, neither of which gives you a clean transaction history. Reconciliation against payment-processor invoices becomes archaeological.

The event-sourced ledger I want replaces "balance as column" with "balance as left-fold over events." Every credit movement should be an immutable event with a type, an amount, a request reference, and a timestamp. The current balance is a projection.

Rendering diagram…

Three event types: reserve, commit, release. The semantics:

Reserve — written before the provider call. Decrements available credits. Increments held credits. Atomic.
Commit — written after a successful provider call. Decrements held, increments spent. Net: credits gone.
Release — written after a failed provider call. Decrements held, increments available. Net: credits restored.

The available balance at any moment is sum(committed) - sum(spent) - sum(holds where status=open). Or more usefully: read the projection table, which is updated by an event handler.

Naive ledger vs. the event-sourced ledger I want

Naive — single decimal column

Concurrent requests race the column (read-modify-write hazard)
Failed provider call leaves debited state, no clean refund
No transaction history; reconciling with payment processor is archaeology
Spend caps require an additional column and additional ad-hoc logic
Auditability is approximate — you can know current state but not the history that produced it

Event-sourced — reserve/commit/release events (the plan)

Reserve writes are atomic upserts on a held-credits row; concurrency safe
Failed provider call triggers a release event; balance restored cleanly
Full transaction history is the source of truth; current state is a projection
Spend caps are queries against committed events in a window
Auditability is exact — you can replay the ledger from any point in time

What spend caps should actually enforce

Spend caps are a small feature with a surprisingly big surface area. The implementation I want:

The pre-request spend-cap check (planned)

Operator sends a request to /api/v1/chat/anthropic/messages with their Bearer token.
Gateway resolves the org from the token and loads the org's spend caps (daily, weekly, monthly, per-provider, per-member).
Gateway estimates the cost of the request before calling the provider — token count of the input, model price, expected output tokens (using a heuristic per model).
Gateway queries the ledger for the current spent-amount in each cap's window.
If current_spent + estimated_cost > cap, the gateway returns 402 Payment Required with which cap was hit.
If all caps pass, the gateway writes the reserve event for estimated_cost and proceeds with the provider call.
After the provider returns, the actual cost is computed from real token usage; the commit event records the true amount, and any over-reservation is released back to available.

The estimation step is approximate. I expect to over-reserve by ~10% on average; the release-back makes the over-reservation invisible to the operator's available balance over time. The benefit is that runaway requests cannot exceed the operator's spend cap by more than the per-request reservation buffer.

I am not building this until I have at least two operators using the system. That is somewhere between four and twelve weeks out. But I am committing to the design now because I do not want to retrofit it after the first reconciliation discrepancy.

Per-org credit pools and partner tiers

Two more dimensions I want to commit to in advance.

First, organizations should have credit pools, not individuals. An operator who is a member of two organizations should see the credit balance of whichever org their current API key was scoped to. This makes credit costs accountable to the right party (the org, not the individual).

Second, partners (eventually — DOS resellers / consultancies) should get markup tiers. A partner adds a percentage to the per-token cost of provider calls made by their downstream clients; the markup should be recorded as a separate line in the ledger so partner attribution and provider cost are independently auditable.

Both of these are months out. I am writing them down now because the schema decisions I am about to make for the events table need to leave room for them. Adding a column to an event-sourced table later is cheap; adding a concept the table cannot represent is expensive.

Name	Type	Required	Default	Description
event_type	reserve \| commit \| release \| adjustment	yes	—	The four legal event kinds. Adjustment is for manual corrections (refunds, comp credits) — rare, audited.
amount_credits	decimal(18,6)	yes	—	Always positive. Direction is implied by event_type. Six decimal places to handle small per-token costs.
org_id	uuid	yes	—	The organization whose pool this affects.
request_ref	uuid	yes	—	The provider request this event corresponds to. Reserve and commit/release share the same ref.
idempotency_key	string	yes	—	Caller-provided. Reserves with the same idempotency_key are deduped within a 24h window.
created_at	timestamptz	yes	now()	Append-only. Never updated. Indexed for projection rebuild.
actor	member_id \| system	yes	system	For audit. Member-initiated requests vs. system-initiated reconciliations.

What I want to commit to, before I have anything to commit it against

Three commitments, in order of how much pushback I expect on them:

One: bundle the cross-cutting concerns until you cannot. Routing, auth, metering, billing, and logging in one service is the right shape for a one-or-two-operator system. They will eventually warrant separate services. They do not warrant separate services yet, and bundling them at the gateway lets the operator-facing API stay simple. Premature splitting is more expensive than late splitting.

Two: never let a single decimal column be your ledger. The event-sourced ledger feels like overkill until your first reconciliation discrepancy. Then it feels like the only sane choice. I want to build it correctly the first time because I cannot afford to find out the hard way at the same time as I am explaining a billing discrepancy to the first paying operator.

Three: spend caps must be enforceable pre-request, not just observable post-request. Post-request spend caps (the kind cloud providers expose: "we'll alert you when you exceed $X") are useful for analytics. Pre-request caps are the only thing that prevents an unattended runaway from emptying the operator's pool. Build both. Default to the pre-request one as the binding constraint.

Idempotency keys all the way down

Every reserve event should require a caller-supplied idempotency key. The same key within 24h returns the same response without making a second provider call. This protects the operator against retries, network drops, and concurrent duplicate requests. The cost is one column on the events table and a unique index. The benefit is that "did the call go through?" stops being a question that requires reconciliation.

Why the gateway should exist

The thesis the substrate serves.

Substrate is replaceable

The provider behind the gateway can change.

Mechanized invariants

Same anticipatory-design discipline, different system.

The gateway is one of those pieces of infrastructure that is invisible when it works and fatal when it does not. I am writing this post before I have written a line of it because I would rather commit to the architecture in public than retrofit it after the first paying operator.

I will report back when the first version of this lands — somewhere in January or February if the schedule holds. The post will tell you what survived contact with reality.

— Lucas

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta

W01

2025-10-28

The Agent Is the Product: Why Intelligence Replaces Interface

2025-10-31

Wrapper Wars: The Week Cursor, Windsurf, and MiniMax All Went Vertical

W02

2025-11-04

Memory Replaces Lock-In: Designing the Substrate of Personal AI

2025-11-07

The Week the Substrate Caught Up: Kimi K2 Thinking Lands

W03

2025-11-11

Why I Just Left a Steady Practice to Build a Personal AI Operating System

2025-11-14

The Week the Harness Ate the Model

W04

2025-11-18

The Decomposition Discipline I Am Trying to Codify Inside DOS

2025-11-21

Antigravity, Gemini 3, Grok 4.1: The Week the Frontier Re-Rendered

W05

2025-11-25

Four Copies, One Source of Truth: The Sync Pattern I Want to Commit to Before It Hurts

2025-11-28

Opus 4.5 Lands on Thanksgiving Week: Anthropic's Quiet Counter-Punch

W06

2025-12-02

Plugin Architecture for Hooks: The Pattern I Want Before the Hook Becomes a God-Function

2025-12-05

The Runtime Is the Moat Now: Anthropic Buys Bun, Claude Code Hits a Billion

W07

2025-12-09

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

2025-12-12

Three Frontier Models in 23 Days: Stop Picking a Winner, Start Picking a Router

W08

2025-12-16

Sentinel, Sketched: Convention-Driven Onboarding Before I Build It

2025-12-19

The Spec Goes Public, the Substrate Goes to War: Skills Opens While Codex, Gemini Flash, and Nemotron Sprint

W09

2025-12-23

Memory, Sketched: The Knowledge Graph I Was Designing Before MemPalace Shipped

2025-12-26

Christmas Wasn't Quiet: Open Weights Caught Up, Nvidia Bought Inference, Frontier Labs Bought Loyalty

W10

2025-12-30

Council, Sketched: The Three-Round Debate Protocol I Want to Formalize Before I Build It

2026-01-02

The Practitioners' Year-End: While the West Took PTO, DeepSeek Shipped Architecture

W11

2026-01-06

Altyaa's Wedge: The Brazilian-Portuguese SMB Bet Studio Is Pointed At

2026-01-09

Default Claude: Microsoft Flips the Switch as the Substrate Distributes by Default

W12

2026-01-13

Builder's Compass: Two Years In, ~3,800 Subscribers, and What I've Learned Teaching Architecture

2026-01-16

The Agent Surface Splits in Two: Anthropic Goes Vertical While China Goes Substrate

W13

2026-01-20

Failure Patterns, Sketched: The Bookkeeping Discipline I Want for the Agent's Mistakes

2026-01-23

The Rules Layer Solidifies: Constitution, Hardware, and Export Controls Land in One Week

W14

2026-01-27

The One Reference Customer Strategy: GTM for a Personal AI OS, Sketched Before the Customer Signs

2026-01-30

Vertical Integration Eats Horizontal AI: Maia, Meta's Capex, and Anthropic-ServiceNow

W15

2026-02-03

Hexagonal in Practice: The Ports and Adapters I'm Pulling Studio Toward

2026-02-06

Coding Becomes the Flagship: Opus 4.6 and GPT-5.3-Codex Land the Same Day

W16

2026-02-10

TDD for AI Agents, Sketched: The Translation I Want to Commit to Before the Eval Suite Exists

2026-02-13

The Bifurcation Hardens: Anthropic's $380B Meets China's Open-Weights Frontier

W17

2026-02-17

Refactoring the Hook Pipeline: The Fowler Walkthrough I'm Three Refactorings Into

2026-02-20

The Sonnet Window: Free-Tier Skills, Code Security, and the Race for Agentic Primitives

W18

2026-02-24

Skills, Packs, and Hooks: The Three-Layer Model I'm Pulling DOS Toward

2026-02-27

The Moat Above the Model: Distillation, Mobile Coding, and MCP Battlegrounds

W19

2026-03-03

The Knowledge Portfolio for an Indie Founder Building in Public, Audited at Week 25

2026-03-06

Procurement Is the New Benchmark: OpenAI Wins the DoW While Anthropic Gets Banned

W20

2026-03-10

Six Months of DOS: What Changed, What Endured, What Surprised

2026-03-13

The Counter-Sovereignty Move: Anthropic Sues, Then Diversifies

W21

2026-03-17

MCP in Production: What Works When the Protocol Becomes the Boundary

2026-03-20

Cursor Becomes a Model Lab: Composer 2 and the Unbundling at the Agent Layer

W22

2026-03-24

Routing by Sovereignty Class: The Architecture That Survives the Procurement Decade

2026-03-27

The Injunction the Pentagon Won't Honor: Anthropic Wins, the State Defies

W23

2026-03-31

Sentinel's First Scan: What the Convention Catalog Found Across Eleven Projects

2026-04-03

The Agent Stack Swallows the IDE, the IPO, and the Courtroom

W24

2026-04-07

Forking MemPalace: The 48-Hour Integration Retro

2026-04-10

Project Glasswing: When the Frontier Bifurcates on Safety

W25

2026-04-14

The 90-Day Open-Weights Bakeoff: What Actually Routes Where in DOS Today

2026-04-17

Routines Eat the Workflow: The Harness Becomes the Product

W26

2026-04-21

The Substrate Portability Pact: What DOS Refuses to Couple to, Even as the Harness Consolidates

2026-04-24

The Substrate Signs a Ten-Year Lease: Gigawatts, Governance, and the Closing Window for Indie Posture

W27

2026-04-28

After Twenty-Seven Weeks: What the Series Was For, and What I'm Building Next

2026-05-01

The Week Exclusivity Died: Substrate Goes Plural, Rules Layer Becomes the Moat