← JournalEssay no. 0310min read

How the six-endpoint Dev Snippets API got built in one weekend

An essay.

A frank rundown of how the Dev Snippets API actually came together — sql.js as the database, Next route handlers, test-key tier bypass, and the schema migration that 500'd /v1/keys until we noticed.

Published

Saturday · UTC

Reading time

10 min

~210 wpm

Word count

2,103

plain English

Format

.mdx

utf-8 · git-tracked

The Dev Snippets API at /products/dev-api is the release for this week. Six endpoints — OG image, PDF, Mermaid diagram, sitemap, headless screenshot, OpenAPI mock — sharing one auth, one tier ladder, one SQLite file. Built in a long weekend, plus a Tuesday spent chasing a single 500 that turned out to be a schema-migration bug. This post is the unvarnished version of how it actually went, written warm enough that the parts that are hacky can stay called hacky.

Why six endpoints, not one

The temptation, when you sit down to build a developer product, is to build one really good thing and call it the product. We tried that once. The problem is that one really good thing is impossible to price, impossible to position, and easy to ignore — the buyer says "okay but do you also do X?" and the answer has to be no.

Six small endpoints solves the positioning problem the way a multitool solves the cutting problem. None of them is the best in class. All of them are good enough that you don't go look for the best in class. The pitch becomes "the six tools every backend dev rewrites" rather than "the world's nicest OG image generator". Easier headline, easier price tag, easier to ship.

The six were chosen by walking the Phase 0–7 changelog and writing down every utility that had taken more than half a day to build inside one of our own products. OG card. PDF receipt. Mermaid render. Sitemap audit. Visual regression screenshot. OpenAPI mock for the new endpoint. That's the list. Nobody invented anything; we just packaged what we already kept rewriting.

The database is a file. The file is sql.js.

The first real decision was where state lives. The four shapes on the table were Postgres on a managed instance, Cloud SQL, a Neon serverless DB, and a SQLite file on the VM. The fifth shape, picked in the end, was a SQLite file on the VM driven by sql.js in-process — a pure-WASM SQLite that runs inside the Node runtime, with no native compile step.

The reason isn't ideology. The reason is that the deploy is one rsync to a small GCP VM, and adding better-sqlite3 to the runtime means adding a native binary that has to match the VM's libc, the container's libc when we eventually containerize, and the Node version on both. sql.js doesn't care. It's a .wasm and a .js file, it loads cold in about 40ms, and it survives every kind of build environment without a node-gyp incident.

The cost is real. sql.js is in-memory. Every commit has to flush manually to disk:

function flushSync(db: Database, path: string): void {
  const bytes = db.export();
  mkdirSync(dirname(path), { recursive: true });
  writeFileSync(path, Buffer.from(bytes));
}

function flush(db: Database): void {
  const entry = globalThis._elofynDevApiDb;
  const path = entry?.path ?? dbPath();
  flushSync(db, path);
}

That's the entire durability story for the Dev Snippets API: every mutation calls flush(db), which serializes the full database to a byte buffer and writes it to data/dev-api.sqlite. No WAL. No fsync. Full-file rewrite on every insert.

It is, by every reasonable measure of a "real" database, the wrong shape. It also works. The user table has fewer than a thousand rows. The usage table grows monotonically but only ever gets read with a COUNT(*) WHERE api_key = ? AND ts > ? against the current month's boundary, which sql.js answers in under a millisecond against the in-memory copy. The bottleneck has never been the database. The bottleneck has been our own code.

We'll outgrow it. Probably around the time the first paying customer breaches 10,000 calls a month, because the full-file rewrite will start showing up in p99 latency for the write path. The migration to better-sqlite3 is a four-hour job that doesn't touch a single SQL query — same dialect, same statements, swap the driver, replace flush() with db.checkpoint(). We will do it the day it matters and not before.

Six endpoints, one router

Routing is Next 16's filesystem router under src/app/api/dev/v1/*/route.ts. Each endpoint is one file. Each file does the same five steps in the same order:

  1. Mint a request id (a ULID prefixed req_).
  2. Validate the X-API-Key header.
  3. Resolve the rate-limit snapshot for that key.
  4. Record one row in usage(api_key, endpoint, ts).
  5. Do the actual thing.

The first four steps live in src/lib/dev-api.ts as standalone functions, and any new endpoint copies the same prelude. We resisted the urge to extract a withAuth(handler) wrapper because the five-step prelude is also the first thing a future contributor reads when they're trying to figure out how a request flows — putting it inline means there is no "you have to also read this other file" moment. The duplication across six files is twelve lines per file, which we judged was a fair price for the no-hidden-magic property.

Tier gating is one switch inside the prelude. Free tier sees three endpoints (OG, diagram, sitemap); pro and scale see all six. The check happens after auth and before any work — we don't want a free caller to be told "you can't" only after we've already spent 800ms launching a headless browser.

The auth path itself has one piece of cleverness worth describing, because it's the kind of cleverness that bites if you don't write it down. Keys come in two flavours, ek_live_* and ek_test_*. Live keys go through a real DB lookup against the users table. Test keys do not. They're treated as pro-tier the moment the regex matches, without ever touching the database:

const KEY_RE = /^ek_(live|test)_[0-9a-f]{32}$/;

export async function validateApiKey(req: Request) {
  const raw = req.headers.get("x-api-key");
  if (!raw) return missing();
  const m = raw.match(KEY_RE);
  if (!m) return invalid();
  const env = m[1] as "live" | "test";
  if (env === "test") {
    // No DB lookup, no usage row written, no quota counted.
    // Test keys are a fixture, not a customer.
    return { ok: true, key: raw, env, tier: "pro" } as const;
  }
  const user = await findUserByKey(raw);
  if (!user) return invalid();
  return { ok: true, key: raw, env, tier: user.tier } as const;
}

This is the bit that's a little hacky and we keep it because it pays back daily. The integration harness in scripts/ runs full end-to-end smoke tests against the live /api/dev/v1/og route every time the build finishes. It signs every request with a literal test key (ek_test_ plus thirty-two zero hex chars), which doesn't exist in any database anywhere, and the route handler treats it as a valid pro-tier caller because the prefix says it should. No seeding. No fixtures. No "did the migration run on this branch" pre- flight. Six endpoints, all tested against the real wire, on every build. The trade-off is that anyone who can read the public docs knows the test prefix exists, so a misconfigured cloud function that hardcoded ek_test_… and pushed to prod would run free pro-tier calls for as long as it took us to notice. We're comfortable with that risk at our current scale. We won't be later. The migration is documented and small.

The schema-migration bug, in slow motion

The Friday before the launch I added two columns to the users table for the Stripe billing rollout — stripe_customer and stripe_subscription_id. Both nullable TEXT. Both added to the SCHEMA constant inside src/lib/dev-api-db.ts. The build passed. The unit tests passed. I shipped.

On Saturday morning the dashboard at /products/dev-api/dashboard started 500'ing on the /v1/keys endpoint with a single, perfectly unhelpful error in the logs:

SqliteError: no such column: stripe_customer
  at Statement.step (sql.js)
  at rows (src/lib/dev-api-db.ts:238)
  at findUserByKey (src/lib/dev-api-db.ts:380)

The query was selecting the new column. The column wasn't there. The schema clearly said it was, because I had just edited the SCHEMA constant.

The bug is one of those satisfying ones where every individual line of code is correct and the system is still broken. SQLite's CREATE TABLE IF NOT EXISTS is a no-op once the table exists. We open the DB by reading the .sqlite file off disk, then we run the SCHEMA constant — which for an existing DB does nothing, because the users table already exists from the previous boot. The new columns in the constant are silently ignored. The query then runs against the old schema. Reads fail. 500.

The fix has two parts. First, two ALTER TABLE statements that try to add each new column on every open, wrapped in try/catch so the second boot (where the column already exists) doesn't crash. Second, and this is the part I missed for an embarrassing four hours: the ALTER has to be flushed to disk. sql.js is in-memory; a schema change in memory does not propagate to the .sqlite file unless we explicitly export and write it. The migration was running on every boot, in-memory, and silently dropping when the process ended.

let migrated = false;
for (const alter of [
  "ALTER TABLE users ADD COLUMN stripe_customer TEXT",
  "ALTER TABLE users ADD COLUMN stripe_subscription_id TEXT",
]) {
  try {
    db.run(alter);
    migrated = true;
  } catch {
    /* column already present — fresh DBs from SCHEMA, or this
       migration already ran on a prior boot. */
  }
}
db.run(
  "CREATE INDEX IF NOT EXISTS users_stripe_customer_idx ON users (stripe_customer)",
);
db.run("PRAGMA foreign_keys = ON");
globalThis._elofynDevApiDb = { db, path };
if (!existed) {
  flushSync(db, path);
} else if (migrated) {
  flushSync(db, path);
}

The if (migrated) flushSync(...) line is the whole fix. Without it, the migration is a Sisyphean ceremony — every boot adds the columns in memory, the process serves a few requests, the process restarts on the next deploy, the columns are gone again. With it, the .sqlite file picks up the columns on first boot after deploy, the flush persists them, and every subsequent boot's ALTER throws the "duplicate column" error that the try/catch silently absorbs.

The thing I keep returning to about this bug is how quiet it was. There was no exception thrown by the ALTER — sql.js ran it cleanly. There was no warning about missing-flush — sql.js doesn't know which of its writes you care about. There was no failed test — our tests seeded a fresh DB on every run, where the SCHEMA constant ran once and was correct. The bug only showed up against an existing on-disk file, in production, with traffic. Every cheap signal said green. Only the expensive signal said red.

This is the failure mode of every database that doesn't have a migrations framework, and it is the price we agreed to pay when we picked the file-on-the-VM model. The fee, in this case, was four hours of Saturday and the public knowledge that I shipped a 500 on launch weekend. Acceptable; recorded; moving on.

What stays hacky, by intent

A handful of choices in the API are deliberately less than ideal and worth naming so nobody is surprised later:

  • The migration runs on every boot. It's idempotent and cheap (two ALTERs that throw), but it is not a proper migrations framework. The next column added gets another try/catch entry and another migrated = true setter. Around the fourth or fifth column we'll lift this into a tiny migrations module. Not yet.
  • The usage table grows monotonically and is never pruned. Every call appends one row. The monthly quota query is fast because it's indexed on (api_key, ts). Storage is the issue: at a million calls a month the table is roughly 80 MB after a year. We will add a monthly compaction job (sum + drop) when the file size starts mattering.
  • Test keys are pro-tier without a DB lookup. Documented in this post and in docs/dev-api-spec.md §1.3. The risk is that someone hardcodes the test prefix in prod. We'll add an env-gated toggle to disable the bypass in production deploys when there's a customer who asks.
  • flushSync is called from a Next route handler. Synchronous filesystem writes inside an async request path are a real cost. Today the user table is small enough that the export+write is sub-millisecond, but the moment that stops being true we have to move flushing into a background worker. Watching the usage table grow is the cheapest leading indicator.

None of these is going to take the system down tomorrow. All of them will, eventually, deserve a proper fix. Writing them down in a blog post is half the discipline of remembering to come back.

What worked

The five-step prelude duplicated across six routes — instead of a clever shared wrapper — meant that when the migration bug bit, the fix lived in exactly one file (src/lib/dev-api-db.ts) and didn't require touching any of the six handlers. The route files have not been edited since launch. The next column we add to users is also a one-file change.

The test-key bypass meant the integration harness ran against the real wire for the entire weekend, including the four hours when the production database had a broken schema. Every regression we introduced got caught by scripts/playwright-gate.mjs before it shipped. Free-tier endpoints were green the whole time; only the key-issuing dashboard, which reads the stripe_customer column, broke. That's exactly the blast radius we hoped for when we drew the surfaces.

Six endpoints in a long weekend is not a flex. It's what's achievable when the database is a file, the deploy is rsync, the auth is a prefix-and-regex, and you let the test keys skip the slow path. Most of the complexity in a "real" developer API is elsewhere — the billing webhook, the dashboard, the migration framework, the audit log. The work to do those next is documented in docs/dev-api-spec.md. By Friday next week, one of them will be done, and the post that explains it will follow.

Until then: six endpoints, one SQLite file, one bug that turned out to be a flushSync I forgot to call. You can try the OG endpoint right now at /products/dev-api — the green status pulse at the top of that page is hitting the same code path that 500'd on Saturday morning, which we find quietly satisfying.