← JournalEssay no. 0310min read
How the six-endpoint Dev Snippets API got built in one weekend
An essay.
A frank rundown of how the Dev Snippets API actually came together — sql.js as the database, Next route handlers, test-key tier bypass, and the schema migration that 500'd /v1/keys until we noticed.
Published
Saturday · UTC
Reading time
10 min
~210 wpm
Word count
2,103
plain English
Format
.mdx
utf-8 · git-tracked
The Dev Snippets API at /products/dev-api is the release for this week. Six endpoints — OG image, PDF, Mermaid diagram, sitemap, headless screenshot, OpenAPI mock — sharing one auth, one tier ladder, one SQLite file. Built in a long weekend, plus a Tuesday spent chasing a single 500 that turned out to be a schema-migration bug. This post is the unvarnished version of how it actually went, written warm enough that the parts that are hacky can stay called hacky.
Why six endpoints, not one
The temptation, when you sit down to build a developer product, is to build one really good thing and call it the product. We tried that once. The problem is that one really good thing is impossible to price, impossible to position, and easy to ignore — the buyer says "okay but do you also do X?" and the answer has to be no.
Six small endpoints solves the positioning problem the way a multitool solves the cutting problem. None of them is the best in class. All of them are good enough that you don't go look for the best in class. The pitch becomes "the six tools every backend dev rewrites" rather than "the world's nicest OG image generator". Easier headline, easier price tag, easier to ship.
The six were chosen by walking the Phase 0–7 changelog and writing down every utility that had taken more than half a day to build inside one of our own products. OG card. PDF receipt. Mermaid render. Sitemap audit. Visual regression screenshot. OpenAPI mock for the new endpoint. That's the list. Nobody invented anything; we just packaged what we already kept rewriting.
The database is a file. The file is sql.js.
The first real decision was where state lives. The four shapes on the table were Postgres on a managed instance, Cloud SQL, a Neon serverless DB, and a SQLite file on the VM. The fifth shape, picked in the end, was a SQLite file on the VM driven by sql.js in-process — a pure-WASM SQLite that runs inside the Node runtime, with no native compile step.
The reason isn't ideology. The reason is that the deploy is one
rsync to a small GCP VM, and adding better-sqlite3 to the runtime
means adding a native binary that has to match the VM's libc, the
container's libc when we eventually containerize, and the Node version
on both. sql.js doesn't care. It's a .wasm and a .js file, it loads
cold in about 40ms, and it survives every kind of build environment
without a node-gyp incident.
The cost is real. sql.js is in-memory. Every commit has to flush manually to disk:
function flushSync(db: Database, path: string): void {
const bytes = db.export();
mkdirSync(dirname(path), { recursive: true });
writeFileSync(path, Buffer.from(bytes));
}
function flush(db: Database): void {
const entry = globalThis._elofynDevApiDb;
const path = entry?.path ?? dbPath();
flushSync(db, path);
}
That's the entire durability story for the Dev Snippets API: every
mutation calls flush(db), which serializes the full database to a
byte buffer and writes it to data/dev-api.sqlite. No WAL. No
fsync. Full-file rewrite on every insert.
It is, by every reasonable measure of a "real" database, the wrong
shape. It also works. The user table has fewer than a thousand rows.
The usage table grows monotonically but only ever gets read with a
COUNT(*) WHERE api_key = ? AND ts > ? against the current month's
boundary, which sql.js answers in under a millisecond against the
in-memory copy. The bottleneck has never been the database. The
bottleneck has been our own code.
We'll outgrow it. Probably around the time the first paying customer
breaches 10,000 calls a month, because the full-file rewrite will
start showing up in p99 latency for the write path. The migration to
better-sqlite3 is a four-hour job that doesn't touch a single SQL
query — same dialect, same statements, swap the driver, replace
flush() with db.checkpoint(). We will do it the day it matters
and not before.
Six endpoints, one router
Routing is Next 16's filesystem router under
src/app/api/dev/v1/*/route.ts. Each endpoint is one file. Each file
does the same five steps in the same order:
- Mint a request id (a ULID prefixed
req_). - Validate the
X-API-Keyheader. - Resolve the rate-limit snapshot for that key.
- Record one row in
usage(api_key, endpoint, ts). - Do the actual thing.
The first four steps live in src/lib/dev-api.ts as standalone
functions, and any new endpoint copies the same prelude. We resisted
the urge to extract a withAuth(handler) wrapper because the
five-step prelude is also the first thing a future contributor reads
when they're trying to figure out how a request flows — putting it
inline means there is no "you have to also read this other file"
moment. The duplication across six files is twelve lines per file,
which we judged was a fair price for the no-hidden-magic property.
Tier gating is one switch inside the prelude. Free tier sees three endpoints (OG, diagram, sitemap); pro and scale see all six. The check happens after auth and before any work — we don't want a free caller to be told "you can't" only after we've already spent 800ms launching a headless browser.
The auth path itself has one piece of cleverness worth describing,
because it's the kind of cleverness that bites if you don't write it
down. Keys come in two flavours, ek_live_* and ek_test_*. Live
keys go through a real DB lookup against the users table. Test keys
do not. They're treated as pro-tier the moment the regex matches,
without ever touching the database:
const KEY_RE = /^ek_(live|test)_[0-9a-f]{32}$/;
export async function validateApiKey(req: Request) {
const raw = req.headers.get("x-api-key");
if (!raw) return missing();
const m = raw.match(KEY_RE);
if (!m) return invalid();
const env = m[1] as "live" | "test";
if (env === "test") {
// No DB lookup, no usage row written, no quota counted.
// Test keys are a fixture, not a customer.
return { ok: true, key: raw, env, tier: "pro" } as const;
}
const user = await findUserByKey(raw);
if (!user) return invalid();
return { ok: true, key: raw, env, tier: user.tier } as const;
}
This is the bit that's a little hacky and we keep it because it pays
back daily. The integration harness in scripts/ runs full
end-to-end smoke tests against the live /api/dev/v1/og route every
time the build finishes. It signs every request with a literal
test key (ek_test_ plus thirty-two zero hex chars), which doesn't
exist in any database anywhere, and the route handler treats it as a
valid pro-tier caller because the prefix says it should. No
seeding. No fixtures. No "did the migration run on this branch" pre-
flight. Six endpoints, all tested against the real wire, on every
build. The trade-off is that anyone who can read the public docs
knows the test prefix exists, so a misconfigured cloud function that
hardcoded ek_test_… and pushed to prod would run free pro-tier
calls for as long as it took us to notice. We're comfortable with
that risk at our current scale. We won't be later. The migration is
documented and small.
The schema-migration bug, in slow motion
The Friday before the launch I added two columns to the users table
for the Stripe billing rollout — stripe_customer and
stripe_subscription_id. Both nullable TEXT. Both added to the
SCHEMA constant inside src/lib/dev-api-db.ts. The build passed.
The unit tests passed. I shipped.
On Saturday morning the dashboard at /products/dev-api/dashboard
started 500'ing on the /v1/keys endpoint with a single, perfectly
unhelpful error in the logs:
SqliteError: no such column: stripe_customer
at Statement.step (sql.js)
at rows (src/lib/dev-api-db.ts:238)
at findUserByKey (src/lib/dev-api-db.ts:380)
The query was selecting the new column. The column wasn't there. The schema clearly said it was, because I had just edited the SCHEMA constant.
The bug is one of those satisfying ones where every individual line
of code is correct and the system is still broken. SQLite's
CREATE TABLE IF NOT EXISTS is a no-op once the table exists. We
open the DB by reading the .sqlite file off disk, then we run the
SCHEMA constant — which for an existing DB does nothing, because the
users table already exists from the previous boot. The new columns
in the constant are silently ignored. The query then runs against the
old schema. Reads fail. 500.
The fix has two parts. First, two ALTER TABLE statements that try
to add each new column on every open, wrapped in try/catch so the
second boot (where the column already exists) doesn't crash. Second,
and this is the part I missed for an embarrassing four hours: the
ALTER has to be flushed to disk. sql.js is in-memory; a schema
change in memory does not propagate to the .sqlite file unless we
explicitly export and write it. The migration was running on every
boot, in-memory, and silently dropping when the process ended.
let migrated = false;
for (const alter of [
"ALTER TABLE users ADD COLUMN stripe_customer TEXT",
"ALTER TABLE users ADD COLUMN stripe_subscription_id TEXT",
]) {
try {
db.run(alter);
migrated = true;
} catch {
/* column already present — fresh DBs from SCHEMA, or this
migration already ran on a prior boot. */
}
}
db.run(
"CREATE INDEX IF NOT EXISTS users_stripe_customer_idx ON users (stripe_customer)",
);
db.run("PRAGMA foreign_keys = ON");
globalThis._elofynDevApiDb = { db, path };
if (!existed) {
flushSync(db, path);
} else if (migrated) {
flushSync(db, path);
}
The if (migrated) flushSync(...) line is the whole fix. Without it,
the migration is a Sisyphean ceremony — every boot adds the columns
in memory, the process serves a few requests, the process restarts
on the next deploy, the columns are gone again. With it, the .sqlite
file picks up the columns on first boot after deploy, the flush
persists them, and every subsequent boot's ALTER throws the
"duplicate column" error that the try/catch silently absorbs.
The thing I keep returning to about this bug is how quiet it was. There was no exception thrown by the ALTER — sql.js ran it cleanly. There was no warning about missing-flush — sql.js doesn't know which of its writes you care about. There was no failed test — our tests seeded a fresh DB on every run, where the SCHEMA constant ran once and was correct. The bug only showed up against an existing on-disk file, in production, with traffic. Every cheap signal said green. Only the expensive signal said red.
This is the failure mode of every database that doesn't have a migrations framework, and it is the price we agreed to pay when we picked the file-on-the-VM model. The fee, in this case, was four hours of Saturday and the public knowledge that I shipped a 500 on launch weekend. Acceptable; recorded; moving on.
What stays hacky, by intent
A handful of choices in the API are deliberately less than ideal and worth naming so nobody is surprised later:
- The migration runs on every boot. It's idempotent and cheap
(two ALTERs that throw), but it is not a proper migrations
framework. The next column added gets another try/catch entry and
another
migrated = truesetter. Around the fourth or fifth column we'll lift this into a tiny migrations module. Not yet. - The usage table grows monotonically and is never pruned. Every
call appends one row. The monthly quota query is fast because
it's indexed on
(api_key, ts). Storage is the issue: at a million calls a month the table is roughly 80 MB after a year. We will add a monthly compaction job (sum + drop) when the file size starts mattering. - Test keys are pro-tier without a DB lookup. Documented in
this post and in
docs/dev-api-spec.md §1.3. The risk is that someone hardcodes the test prefix in prod. We'll add an env-gated toggle to disable the bypass in production deploys when there's a customer who asks. flushSyncis called from a Next route handler. Synchronous filesystem writes inside an async request path are a real cost. Today the user table is small enough that the export+write is sub-millisecond, but the moment that stops being true we have to move flushing into a background worker. Watching theusagetable grow is the cheapest leading indicator.
None of these is going to take the system down tomorrow. All of them will, eventually, deserve a proper fix. Writing them down in a blog post is half the discipline of remembering to come back.
What worked
The five-step prelude duplicated across six routes — instead of a
clever shared wrapper — meant that when the migration bug bit, the
fix lived in exactly one file (src/lib/dev-api-db.ts) and didn't
require touching any of the six handlers. The route files have not
been edited since launch. The next column we add to users is also
a one-file change.
The test-key bypass meant the integration harness ran against the
real wire for the entire weekend, including the four hours when the
production database had a broken schema. Every regression we
introduced got caught by scripts/playwright-gate.mjs before it
shipped. Free-tier endpoints were green the whole time; only the
key-issuing dashboard, which reads the stripe_customer column,
broke. That's exactly the blast radius we hoped for when we drew
the surfaces.
Six endpoints in a long weekend is not a flex. It's what's
achievable when the database is a file, the deploy is rsync, the
auth is a prefix-and-regex, and you let the test keys skip the
slow path. Most of the complexity in a "real" developer API is
elsewhere — the billing webhook, the dashboard, the migration
framework, the audit log. The work to do those next is documented
in docs/dev-api-spec.md. By Friday next week, one of them will be
done, and the post that explains it will follow.
Until then: six endpoints, one SQLite file, one bug that turned out
to be a flushSync I forgot to call. You can try the OG endpoint
right now at /products/dev-api — the green
status pulse at the top of that page is hitting the same code path
that 500'd on Saturday morning, which we find quietly satisfying.