Pular para o conteúdo
DuranteDurante
ALL SYSTEMSGet Access

27 semanas · 54 textos · Escritos durante a construção

Notas de campo de um SO de IA pessoal em voo

Toda terça-feira, um ensaio perene sobre o que aprendi enquanto envio o DuranteOS. Toda sexta, um boletim da semana. Cerca de 108 mil palavras e contando — para construtores que preferem ver a fundação ser lançada a ler o press release.

Assinar · Ensaio de terça

Cerca de 3.800 construtores leem isso toda semana.

Credit-Metered API Gateways: The Pattern I Want Before I Have Anything to Meter

I do not have a Studio yet. I do not have a credit ledger yet. I do not have twelve providers behind a Bearer token yet. What I have is a clear picture of the gateway I want to build in the next 4-8 weeks, and the conviction that committing to the architecture now — before there is anything to retrofit — is cheaper than committing to it later. Here is the design, with Fowler on the routing layer and Greg Young on the ledger.

I am writing this post about a system I have not yet built.

What I have, on December 8, 2025, twelve weeks into DOS: a private experiment running on my own machine, calling Anthropic's API directly. No gateway. No credit ledger. No multi-provider routing. One operator, one substrate, one billing relationship. It is the simplest possible shape, and it works because the operator and the customer are the same person.

What I can already see coming, in the next 4-8 weeks: I will need to call more than one provider. The substrate is plural — Opus 4.5, Gemini 3 Pro, GPT-5.x, the open-weights tier — and routing-by-task is becoming an architectural requirement, not a future feature. The moment I have a second provider, I have a per-provider auth surface, a per-provider rate limit, a per-provider pricing schema, and a per-provider failure mode. The moment I have an operator who is not me, I need spend caps, an audit trail, and a billing reconciliation story.

This post is the gateway I want to build before any of that lands. It is the second piece of the substrate-portability bet I made publicly last week. Writing the design before writing the code is what gives me a chance of getting the seams right on first draft.

The gateway in one sentence

One HTTP service that routes calls to N AI providers behind a single Bearer token, debiting an organization's credit pool atomically per request, with spend caps enforced before the call goes out and refunds applied if the call fails after debit.

The Fowler angle: this should be the API Gateway pattern, plus an Anti-Corruption Layer

Martin Fowler's Patterns of Enterprise Application Architecture (and later the microservices.io catalog) names two patterns that should do most of the work here.

The first is API Gateway — a single edge service that the client talks to, which fans out to multiple backend services. The classic justifications: client doesn't need to know which backend handles which call, the gateway can enforce cross-cutting concerns once (auth, rate limiting, logging) instead of replicating them in each backend, and the backends can evolve their interfaces without breaking clients.

The second is Anti-Corruption Layer — the gateway translates between the client's coherent vocabulary and the backends' inconsistent ones. OpenAI's chat completions API, Anthropic's Messages API, and Google Gemini's generateContent endpoint do roughly the same thing with three completely different schemas. The gateway should expose one schema and translate internally.

Here is the layer map I want to commit to:

ConcernPatternWhere it should live
Routing to providerAPI Gatewayapp/api/v1/{service}/{provider}/route.ts
Translating provider quirksAnti-Corruption Layerlib/providers/{provider}/adapter.ts
AuthEdge middlewaremiddleware.ts (Bearer token)
Rate limitingEdge middlewarelib/rate-limit/per-org.ts
Metering / billingDomain servicelib/credits/ledger.ts
Logging / auditCross-cuttingStructured logger + sync to durable store

The bundling — five concerns, one service — is deliberate. It is also, I am aware, the kind of decision a senior platform engineer would push back on. The standard advice is "split each concern into its own service." That advice is correct in the abstract and wrong in the concrete for a one-operator system that needs to ship in weeks, not quarters.

What I am committing to: bundle now, with each concern in its own named layer that has its own tests, so the eventual split is a deployment change, not a rewrite. Premature splitting is more expensive than late splitting. The seams have to be pre-cut; they do not have to be pre-split.

The route shape I want

The URL pattern should be deliberately the same for every provider:

POST /api/v1/{service}/{provider}/{operation}
Authorization: Bearer dos_org_xxxx
Idempotency-Key: <uuid>

service is one of chat, image, audio, video, embedding, tts, stt, research. provider is whatever provider I am routing to (anthropic, openai, google, replicate, elevenlabs, mistral, xai, perplexity, groq, deepseek, meta, cohere — twelve is the rough target inside six months). operation is the verb the provider uses (messages, completions, generate, etc.) — preserved from the upstream API so the request body can pass through with minimal translation.

This is the principle Fowler calls "honoring the underlying interface where you can." Do not invent a new schema for things that are already standard. Translate where providers diverge from the standard. Add cross-cutting concerns (idempotency, metering, auth) at the gateway layer.

The Greg Young angle: the credit ledger should be event-sourced

The ledger is the part I expect will take me longest to get right, and it is also the part where I want to lean hardest on Greg Young's work on event sourcing and CQRS.

A credit ledger looks like a balance. The naive implementation is a single column on the organizations table: credits_remaining: decimal. You debit it on each request. Done.

That implementation cannot survive contact with reality. Concurrent requests race the column. Failed provider calls leave you with debited-but-uncompensated state. Refunds require either a write to "amount" or a write to a separate "refunds" table, neither of which gives you a clean transaction history. Reconciliation against payment-processor invoices becomes archaeological.

The event-sourced ledger I want replaces "balance as column" with "balance as left-fold over events." Every credit movement should be an immutable event with a type, an amount, a request reference, and a timestamp. The current balance is a projection.

Rendering diagram…

Three event types: reserve, commit, release. The semantics:

  • Reserve — written before the provider call. Decrements available credits. Increments held credits. Atomic.
  • Commit — written after a successful provider call. Decrements held, increments spent. Net: credits gone.
  • Release — written after a failed provider call. Decrements held, increments available. Net: credits restored.

The available balance at any moment is sum(committed) - sum(spent) - sum(holds where status=open). Or more usefully: read the projection table, which is updated by an event handler.

Naive ledger vs. the event-sourced ledger I want

Naive — single decimal column
  • Concurrent requests race the column (read-modify-write hazard)
  • Failed provider call leaves debited state, no clean refund
  • No transaction history; reconciling with payment processor is archaeology
  • Spend caps require an additional column and additional ad-hoc logic
  • Auditability is approximate — you can know current state but not the history that produced it
Event-sourced — reserve/commit/release events (the plan)
  • Reserve writes are atomic upserts on a held-credits row; concurrency safe
  • Failed provider call triggers a release event; balance restored cleanly
  • Full transaction history is the source of truth; current state is a projection
  • Spend caps are queries against committed events in a window
  • Auditability is exact — you can replay the ledger from any point in time

What spend caps should actually enforce

Spend caps are a small feature with a surprisingly big surface area. The implementation I want:

The pre-request spend-cap check (planned)

  1. Operator sends a request to /api/v1/chat/anthropic/messages with their Bearer token.
  2. Gateway resolves the org from the token and loads the org's spend caps (daily, weekly, monthly, per-provider, per-member).
  3. Gateway estimates the cost of the request before calling the provider — token count of the input, model price, expected output tokens (using a heuristic per model).
  4. Gateway queries the ledger for the current spent-amount in each cap's window.
  5. If current_spent + estimated_cost > cap, the gateway returns 402 Payment Required with which cap was hit.
  6. If all caps pass, the gateway writes the reserve event for estimated_cost and proceeds with the provider call.
  7. After the provider returns, the actual cost is computed from real token usage; the commit event records the true amount, and any over-reservation is released back to available.

The estimation step is approximate. I expect to over-reserve by ~10% on average; the release-back makes the over-reservation invisible to the operator's available balance over time. The benefit is that runaway requests cannot exceed the operator's spend cap by more than the per-request reservation buffer.

I am not building this until I have at least two operators using the system. That is somewhere between four and twelve weeks out. But I am committing to the design now because I do not want to retrofit it after the first reconciliation discrepancy.

Per-org credit pools and partner tiers

Two more dimensions I want to commit to in advance.

First, organizations should have credit pools, not individuals. An operator who is a member of two organizations should see the credit balance of whichever org their current API key was scoped to. This makes credit costs accountable to the right party (the org, not the individual).

Second, partners (eventually — DOS resellers / consultancies) should get markup tiers. A partner adds a percentage to the per-token cost of provider calls made by their downstream clients; the markup should be recorded as a separate line in the ledger so partner attribution and provider cost are independently auditable.

Both of these are months out. I am writing them down now because the schema decisions I am about to make for the events table need to leave room for them. Adding a column to an event-sourced table later is cheap; adding a concept the table cannot represent is expensive.

NameTypeRequiredDefaultDescription
event_typereserve | commit | release | adjustmentyesThe four legal event kinds. Adjustment is for manual corrections (refunds, comp credits) — rare, audited.
amount_creditsdecimal(18,6)yesAlways positive. Direction is implied by event_type. Six decimal places to handle small per-token costs.
org_iduuidyesThe organization whose pool this affects.
request_refuuidyesThe provider request this event corresponds to. Reserve and commit/release share the same ref.
idempotency_keystringyesCaller-provided. Reserves with the same idempotency_key are deduped within a 24h window.
created_attimestamptzyesnow()Append-only. Never updated. Indexed for projection rebuild.
actormember_id | systemyessystemFor audit. Member-initiated requests vs. system-initiated reconciliations.

What I want to commit to, before I have anything to commit it against

Three commitments, in order of how much pushback I expect on them:

One: bundle the cross-cutting concerns until you cannot. Routing, auth, metering, billing, and logging in one service is the right shape for a one-or-two-operator system. They will eventually warrant separate services. They do not warrant separate services yet, and bundling them at the gateway lets the operator-facing API stay simple. Premature splitting is more expensive than late splitting.

Two: never let a single decimal column be your ledger. The event-sourced ledger feels like overkill until your first reconciliation discrepancy. Then it feels like the only sane choice. I want to build it correctly the first time because I cannot afford to find out the hard way at the same time as I am explaining a billing discrepancy to the first paying operator.

Three: spend caps must be enforceable pre-request, not just observable post-request. Post-request spend caps (the kind cloud providers expose: "we'll alert you when you exceed $X") are useful for analytics. Pre-request caps are the only thing that prevents an unattended runaway from emptying the operator's pool. Build both. Default to the pre-request one as the binding constraint.

Idempotency keys all the way down

Every reserve event should require a caller-supplied idempotency key. The same key within 24h returns the same response without making a second provider call. This protects the operator against retries, network drops, and concurrent duplicate requests. The cost is one column on the events table and a unique index. The benefit is that "did the call go through?" stops being a question that requires reconciliation.

The gateway is one of those pieces of infrastructure that is invisible when it works and fatal when it does not. I am writing this post before I have written a line of it because I would rather commit to the architecture in public than retrofit it after the first paying operator.

I will report back when the first version of this lands — somewhere in January or February if the schedule holds. The post will tell you what survived contact with reality.

— Lucas

Was this page helpful?

O arco de 27 semanas · Um corpo de trabalho

Vinte e sete semanas. Dois textos por semana. Seis meses de escrita durante a construção.

Semana

Ensaio de terça

Boletim de sexta