The acute incident is solved. This is the durable fix: systematically replacing every unbounded quantity that drove the failure with a bounded one — so the failure mode is structurally absent as we grow, not merely less likely.
One inequality governs everything
Every symptom — daily-plan 500s, ECHECKOUTTIMEOUT at rollover, the 27.8s past-tests endpoint, the silent 10-day regression — is the same thing: dataset-proportional work run synchronously on an unbounded, unobservable pool. The pressure on Postgres is one product, and today both terms are uncapped. The fix bounds both.
Three axes do it: cap query_seconds (indexes + page-shaped queries + enforced statement_timeout), cap & partition active_requests (one pooler chokepoint with per-tier budgets), and remove heavy work from the request path (async generation + amortized caches). The seven pillars below, each with the before→after that makes it concrete.
INSERT…ON CONFLICT — not a request-path lock — guarantee correctness.When no plan exists, /today runs the ~11s 6-CTE generation synchronously on the inbound connection, with a bare INSERT (no ON CONFLICT). At local midnight every plan expires at once → multi-tab/device first-requests each run 11s and pin a connection → the rollover stampede that drained the pool.
Move generation into a Trigger.dev job on a bounded queue (concurrencyLimit 8); /today returns 202 generating in <100ms and the client polls. A scheduled pre-compute warms plans per-timezone before rollover, so the common case is a sub-100ms existing-plan read. INSERT…ON CONFLICT (user_id, plan_date) DO NOTHING + a job idempotencyKey make duplicates impossible — no request-path lock needed (a lock would itself hold a connection).
Request latency is decoupled from work latency; expensive DB sessions are fixed at ≤8 regardless of how many users roll over. The herd becomes a bounded queue, not a connection storm.
getOrGenerateDailyPlan → cheap read + enqueue; extract generateDailyPlanForUser()generate-daily-plan.trigger.ts (queue concurrency 8, SET LOCAL statement_timeout 8s)precomputeDailyPlans scheduled task (per-tz cohorts)createDailyPlan → INSERT…ON CONFLICT DO NOTHING RETURNING~80% of the 11s generation is recomputing aggregates identical for every user: a 77,231-row system-flashcard scan, a 113,900-row validated-question COUNT(DISTINCT) + disk sort, 1,738 random exercise-asset heap reads. Re-ground from scratch on every plan.
A precomputed subject_content_stats / topic_content_stats layer (plain UPSERT-able tables, refreshed on content change + short TTL). The hot path then computes only the small per-user delta; COUNT(DISTINCT)→COUNT (PK-unique keys) drops three external-merge sorts.
DB work goes from O(requests × content) to O(content-changes) + O(tiny per-user delta). Adding users no longer multiplies the heavy scan — it's already done.
52 sites build SQL by string interpolation and run it as the anon role via a SECURITY DEFINER RPC. Four user-facing endpoints take raw input straight into the SQL text — HIGH/CRITICAL injection. And because literals embed in a dynamic EXECUTE, the whole class collapses under one pg_stat_statements id → invisible to slow-query ranking.
Replace every site with Prisma $queryRaw + $N binds behind a typed repo and a single safeSql allowlist (UUID/enum/ORDER-BY validation). Then drop the RPC and revoke the anon EXECUTE grant.
① injection becomes structurally impossible (data can only be a value, never structure); ② one normalized queryid per query → ranking works; ③ Postgres caches one generic plan per shape instead of re-parsing every call.
src/repository/_sql/safeSql.ts (only sanctioned builder)questions.repository.v1.ts; drop execute_sql| card #1 |
| card #2 — leaked |
| … all rows |
| 0 rows — treated as a literal string |
/past-tests joins two one-to-many children to one parent and de-dups over the product — 4M rows, two disk sorts, 27.8s, no LIMIT. Others GROUP BY + json_agg before the LIMIT, so page 100 costs as much as page 1. And getDeckStats is an N+1 (one RPC per deck in Promise.all).
Three canonical shapes: page-before-aggregate (id-only WHERE+ORDER+LIMIT CTE first, then LEFT JOIN LATERAL json_agg on the ~20 survivors); no cartesians (trusted denormalized counts + per-child LATERAL); no N+1 (one WHERE x = ANY($ids) GROUP BY).
O(tests×topics×questions) → O(page_size). Page 1 and page 50, a new user and a 2,000-test user, all cost the same. p99 stops tracking the largest account.
question_tags has only a composite PK, so every tag_id filter seq-scans 119k rows (the 12s quiz-gen + ~10s RAG queries). validation_status is absent from the flashcards index, forcing 20k–77k random heap fetches on the 2.1 GB table (deck-subjects 6.5s).
One rule: per hot path, a covering/partial index holding the filter + sort columns (INCLUDE the projected ones) → Index-Only Scan, heap untouched. Highest-leverage: question_tags(tag_id, question_id); the flashcards covering/partial set; tests(user_id, created_at).
Total-table-size work → result-set-size work. The cache-resident narrow index grows like log(n); the 2.1 GB heap that doesn't fit in cache is never touched.
A bare new PrismaClient() (no connection_limit), 57 Trigger.dev processes, and the anon PostgREST path all open uncoordinated pools against one 90-connection ceiling, with no enforced statement_timeout through the pooler. Backends and hold-time are both unbounded.
Supavisor (6543) is the single chokepoint (5432 reserved for migrations). Per-tier pool budgets (web ~30, worker ~15, + system) sum under 90 with headroom; statement_timeout enforced on every path (verified through the pooler, or SET LOCAL); excess load becomes bounded queue wait, never more backends.
Server-side backends become O(1) — fixed at the sum of pool sizes (≤55 by construction) no matter how many replicas, workers, or users. Capacity scales by turning one dial.
statement_timeout → excess load = queue wait, not new backends
The heaviest class was structurally invisible (collapsed under one queryid); HTTP buckets top out at 5s so an 11–16s request looks like 5.1s; no pool / 500-rate / slow-query alert; 41 console.error bypass Loki. A chronic regression bled for ~10 days with no signal until the pool hit 90.
Parameterization restores per-query queryids; a Prisma $extends emits a db_query_duration histogram + pool gauge; widen HTTP buckets to 30s; multi-window SLO burn-rate alerts on pool saturation, 500-rate, slow queries; a CI ratchet drives string-SQL 52→0 and forbids regrowth.
Detection flips from O(engineer-hours) hand-EXPLAINing to O(1) automated. MTTD: ~10 days → minutes, independent of traffic or query count.
The durable principles
Decouple request latency from work latency. Heavy, variable-cost work belongs on a bounded worker tier — never on a web connection. The request holds a connection <100ms and returns a 202.
Make cost proportional to the response, not the dataset. Page-before-aggregate + index-only reads make page 1 and page 50, a new user and a 2,000-test user, all cost the same.
Amortize shared work: compute once, read forever. Aggregates identical for every user are precomputed; DB work goes from O(requests × content) to O(content-changes).
Treat the pool as a fixed, partitioned budget. Per-tier sizes sum under max_connections; every connection self-frees within a timeout; excess load is queue wait, not new backends.
Keep user data out of the SQL string. $N binds make injection impossible, cache one plan per shape, and restore a stable queryid — three wins from one mechanism.
Put correctness in the database, not request-path locks. INSERT…ON CONFLICT + job idempotency keys collapse races to one row and one run — no lock that itself holds a connection.
Sequenced, structural — not band-aids
connection_limit (web ~8, worker ~3) + pool_timeout; per-tier pool sizes under 90statement_timeout (verify through pooler, else SET LOCAL)/past-tests cartesian; batch the getDeckStats N+1safeSql allowlist; migrate all 52 execute_sql sites to $queryRaw; drop the RPC + revoke anon EXECUTEON CONFLICT, 202); content-stats cache; pre-compute jobTarget state · measurable
/daily-plan/today p99 <100ms · 100% of generation off the web tier on a queue of 8 · zero 23505/500 races$queryRaw