✓ INCIDENT MITIGATED  ·  this page = the permanent architecture ← back to the RCA

An architecture where the pool collapse is impossible at 10× scale.

The acute incident is solved. This is the durable fix: systematically replacing every unbounded quantity that drove the failure with a bounded one — so the failure mode is structurally absent as we grow, not merely less likely.

11s<100ms
/daily-plan/today p99
27.8s<50ms
/past-tests (page-bound)
520
string-built SQL sites
≤55/90
pool, bounded by construction

One inequality governs everything

The whole failure class, in one line

Every symptom — daily-plan 500s, ECHECKOUTTIMEOUT at rollover, the 27.8s past-tests endpoint, the silent 10-day regression — is the same thing: dataset-proportional work run synchronously on an unbounded, unobservable pool. The pressure on Postgres is one product, and today both terms are uncapped. The fix bounds both.

DB pressure uncapped → boundedactive_requests × uncapped → boundedquery_seconds
click → bound both terms  ·  today: both uncapped — peak crosses ~55 usable connections and the pool seizes

Three axes do it: cap query_seconds (indexes + page-shaped queries + enforced statement_timeout), cap & partition active_requests (one pooler chokepoint with per-tier budgets), and remove heavy work from the request path (async generation + amortized caches). The seven pillars below, each with the before→after that makes it concrete.

01

Daily-plan generation → async & decoupled

Structural
Stop generating the plan on a web connection: pre-compute ahead of rollover, fall back to a bounded async worker, and let an idempotent INSERT…ON CONFLICT — not a request-path lock — guarantee correctness.
Problem today

When no plan exists, /today runs the ~11s 6-CTE generation synchronously on the inbound connection, with a bare INSERT (no ON CONFLICT). At local midnight every plan expires at once → multi-tab/device first-requests each run 11s and pin a connection → the rollover stampede that drained the pool.

Permanent fix

Move generation into a Trigger.dev job on a bounded queue (concurrencyLimit 8); /today returns 202 generating in <100ms and the client polls. A scheduled pre-compute warms plans per-timezone before rollover, so the common case is a sub-100ms existing-plan read. INSERT…ON CONFLICT (user_id, plan_date) DO NOTHING + a job idempotencyKey make duplicates impossible — no request-path lock needed (a lock would itself hold a connection).

Why it scales

Request latency is decoupled from work latency; expensive DB sessions are fixed at ≤8 regardless of how many users roll over. The herd becomes a bounded queue, not a connection storm.

  • split getOrGenerateDailyPlan → cheap read + enqueue; extract generateDailyPlanForUser()
  • new generate-daily-plan.trigger.ts (queue concurrency 8, SET LOCAL statement_timeout 8s)
  • new hourly precomputeDailyPlans scheduled task (per-tz cohorts)
  • createDailyPlanINSERT…ON CONFLICT DO NOTHING RETURNING
day-rollover · ~30 requestsconnection pool · 90
⏱ 00:00
before · synchronous · holds 11s
ECHECKOUTTIMEOUT · 90/90
after · 202 in <100ms + bounded queue 202 generating 202
worker queue · concurrencyLimit 8
pre-warmed: plans ready
02

Content-derived stats → amortized cache

Structural
Stop re-grinding the entire content bank on every request — compute the user-independent aggregates once, read them in microseconds forever after.
Problem today

~80% of the 11s generation is recomputing aggregates identical for every user: a 77,231-row system-flashcard scan, a 113,900-row validated-question COUNT(DISTINCT) + disk sort, 1,738 random exercise-asset heap reads. Re-ground from scratch on every plan.

Permanent fix

A precomputed subject_content_stats / topic_content_stats layer (plain UPSERT-able tables, refreshed on content change + short TTL). The hot path then computes only the small per-user delta; COUNT(DISTINCT)→COUNT (PK-unique keys) drops three external-merge sorts.

Why it scales

DB work goes from O(requests × content) to O(content-changes) + O(tiny per-user delta). Adding users no longer multiplies the heavy scan — it's already done.

  • new tables + content-change UPSERT hooks / scheduled refresh
  • Q1 rewritten to JOIN the stats tables, compute only user-specific CTEs
compute per requestcompute once · read forever
11s / request
113,900 questions
77,231 flashcards
1.18M assets
<100ms / request
computed once →
content
stats
+ per-user delta
03

execute_sql RPC → parameterized repository

Structural
Separate the data from the SQL: one repository-layer change seals injection, makes every slow query visible again, and lets Postgres reuse plans.
Problem today

52 sites build SQL by string interpolation and run it as the anon role via a SECURITY DEFINER RPC. Four user-facing endpoints take raw input straight into the SQL text — HIGH/CRITICAL injection. And because literals embed in a dynamic EXECUTE, the whole class collapses under one pg_stat_statements id → invisible to slow-query ranking.

Permanent fix

Replace every site with Prisma $queryRaw + $N binds behind a typed repo and a single safeSql allowlist (UUID/enum/ORDER-BY validation). Then drop the RPC and revoke the anon EXECUTE grant.

Why it scales · three wins, one change

① injection becomes structurally impossible (data can only be a value, never structure); ② one normalized queryid per query → ranking works; ③ Postgres caches one generic plan per shape instead of re-parsing every call.

  • new src/repository/_sql/safeSql.ts (only sanctioned builder)
  • migrate 52 sites; delete dead questions.repository.v1.ts; drop execute_sql
string-interpolated (anon)$N bind params
input: all' OR '1'='1
WHERE card_type = 'all' OR '1'='1'
card #1
card #2 — leaked
… all rows
shield: BREACHED
slow-query rank: invisible
input → bound value
WHERE card_type = $1  ⟵ "all' OR '1'='1"
0 rows — treated as a literal string
shield: sealed 🛡
queryid: ranked · plan cached
04

Query shape → page-before-aggregate

Near-term
Make every read cost the size of the page you return, not the size of the history the user owns.
Problem today

/past-tests joins two one-to-many children to one parent and de-dups over the product — 4M rows, two disk sorts, 27.8s, no LIMIT. Others GROUP BY + json_agg before the LIMIT, so page 100 costs as much as page 1. And getDeckStats is an N+1 (one RPC per deck in Promise.all).

Permanent fix

Three canonical shapes: page-before-aggregate (id-only WHERE+ORDER+LIMIT CTE first, then LEFT JOIN LATERAL json_agg on the ~20 survivors); no cartesians (trusted denormalized counts + per-child LATERAL); no N+1 (one WHERE x = ANY($ids) GROUP BY).

Why it scales

O(tests×topics×questions)O(page_size). Page 1 and page 50, a new user and a 2,000-test user, all cost the same. p99 stops tracking the largest account.

aggregate-then-paginate · cartesianpage-first · O(page)
scan 4,000,341 → return 20
json_agg(DISTINCT) ⟶ 💾 spill · 27.8s
scan 20 → return 20
id-CTE → LATERAL · <50ms
05

Indexing → hot paths become index-only

Immediate
Put WHERE + ORDER BY + projected columns inside the index, so the planner answers from a narrow B-tree and never random-reads the 2.1 GB heap.
Problem today

question_tags has only a composite PK, so every tag_id filter seq-scans 119k rows (the 12s quiz-gen + ~10s RAG queries). validation_status is absent from the flashcards index, forcing 20k–77k random heap fetches on the 2.1 GB table (deck-subjects 6.5s).

Permanent fix

One rule: per hot path, a covering/partial index holding the filter + sort columns (INCLUDE the projected ones) → Index-Only Scan, heap untouched. Highest-leverage: question_tags(tag_id, question_id); the flashcards covering/partial set; tests(user_id, created_at).

Why it scales

Total-table-size work → result-set-size work. The cache-resident narrow index grows like log(n); the 2.1 GB heap that doesn't fit in cache is never touched.

index → random heap readsindex-only · 0 heap
heap fetches → 24,916 · 6.5s flashcards heap · 2.1 GB
heap fetches → 0 · <20ms covering index · in-cache
06

Connection & concurrency architecture

Immediate
Turn the connection pool into a fixed-size, partitioned, self-draining resource that no single query or request herd can ever monopolize.
Problem today

A bare new PrismaClient() (no connection_limit), 57 Trigger.dev processes, and the anon PostgREST path all open uncoordinated pools against one 90-connection ceiling, with no enforced statement_timeout through the pooler. Backends and hold-time are both unbounded.

Permanent fix

Supavisor (6543) is the single chokepoint (5432 reserved for migrations). Per-tier pool budgets (web ~30, worker ~15, + system) sum under 90 with headroom; statement_timeout enforced on every path (verified through the pooler, or SET LOCAL); excess load becomes bounded queue wait, never more backends.

Why it scales

Server-side backends become O(1) — fixed at the sum of pool sizes (≤55 by construction) no matter how many replicas, workers, or users. Capacity scales by turning one dial.

shared bank · no timeoutpartitioned · self-draining
unbounded → 90/90 seize
POOL EXHAUSTED
fixed budget · green headroom
web 30
wk 15
sys 34
free
every connection self-frees within statement_timeout → excess load = queue wait, not new backends
07

Observability & guardrails

Structural
Make every query visible, every regression alarmed, and every blind spot un-mergeable — so the next slow query is caught on Day 1, not at collapse on Day 10.
Problem today

The heaviest class was structurally invisible (collapsed under one queryid); HTTP buckets top out at 5s so an 11–16s request looks like 5.1s; no pool / 500-rate / slow-query alert; 41 console.error bypass Loki. A chronic regression bled for ~10 days with no signal until the pool hit 90.

Permanent fix

Parameterization restores per-query queryids; a Prisma $extends emits a db_query_duration histogram + pool gauge; widen HTTP buckets to 30s; multi-window SLO burn-rate alerts on pool saturation, 500-rate, slow queries; a CI ratchet drives string-SQL 52→0 and forbids regrowth.

Why it scales

Detection flips from O(engineer-hours) hand-EXPLAINing to O(1) automated. MTTD: ~10 days → minutes, independent of traffic or query count.

blind · regression hidden until collapsevisible · alert fires Day 1
5s bucket ceiling (clipped) Day 10 · COLLAPSE
SLO budget 🔔 alert · Day 1

The durable principles

What to remember when you build the next feature

01

Decouple request latency from work latency. Heavy, variable-cost work belongs on a bounded worker tier — never on a web connection. The request holds a connection <100ms and returns a 202.

02

Make cost proportional to the response, not the dataset. Page-before-aggregate + index-only reads make page 1 and page 50, a new user and a 2,000-test user, all cost the same.

03

Amortize shared work: compute once, read forever. Aggregates identical for every user are precomputed; DB work goes from O(requests × content) to O(content-changes).

04

Treat the pool as a fixed, partitioned budget. Per-tier sizes sum under max_connections; every connection self-frees within a timeout; excess load is queue wait, not new backends.

05

Keep user data out of the SQL string. $N binds make injection impossible, cache one plan per shape, and restore a stable queryid — three wins from one mechanism.

06

Put correctness in the database, not request-path locks. INSERT…ON CONFLICT + job idempotency keys collapse races to one row and one run — no lock that itself holds a connection.

Sequenced, structural — not band-aids

Phased rollout

Phase 1 — Bound the blast radius · foundation, mostly no-code

  • Supavisor = single chokepoint; Prisma connection_limit (web ~8, worker ~3) + pool_timeout; per-tier pool sizes under 90
  • Enforce statement_timeout (verify through pooler, else SET LOCAL)

Phase 2 — Make reads page-proportional

  • page-before-aggregate on list endpoints; kill the /past-tests cartesian; batch the getDeckStats N+1

Phase 3 — Seal & illuminate the data layer

  • safeSql allowlist; migrate all 52 execute_sql sites to $queryRaw; drop the RPC + revoke anon EXECUTE

Phase 4 — Decouple heavy generation

  • daily-plan async (queue concurrency 8, idempotency, ON CONFLICT, 202); content-stats cache; pre-compute job

Phase 5 — Close the loop

  • query-duration histogram + pool gauge; widen HTTP buckets to 30s; SLO burn-rate alerts; CI ratchet string-SQL 52→0

Target state · measurable

How we know it's done

/daily-plan/today p99 <100ms · 100% of generation off the web tier on a queue of 8 · zero 23505/500 races
Warm SLOs at 10× DAU: quiz <300ms, lessons <50ms, past-tests <50ms (from 27.8s) — no unbounded full-history endpoint
Pool a fixed budget: backends ≤~55/90 by construction · web/worker isolated · alert 0.7 warn / 0.9 page · zero ECHECKOUTTIMEOUT at rollover
Zero string-built SQL: 52→0 · anon EXECUTE revoked · injection surface = 0 · all SQL via parameterized $queryRaw
Amortized stats: Q1 from ~11s to sub-100ms residual · 3 disk sorts gone · staleness gauge alarmed
MTTD for a slow-query regression: ~10 days → minutes · every query one queryid + one histogram series