Elofyn Tools · About

About the CSV ↔ JSON converter

← Back to the tool

Why a tool that does just this exists

Tabular data spends its life passing between CSV and JSON. CSV is what spreadsheets export, what databases \copy out, and what ETL jobs land at the start of every Tuesday morning. JSON is what HTTP APIs return, what configuration files declare, and what log records aggregate into. Most working days touch both. The seam between them is small, well-understood, and where a surprising amount of debugging time gets quietly spent.

The job sounds trivial — flip a comma-separated table into an array of objects, or the other way around — and most of the time it is. But the edges have teeth. A comma inside a quoted field. A stray byte-order-mark at the front of the file that Excel wrote and nobody noticed. A header row that's missing from half the upstream batches. A column that looks numeric until row 9 712 produces "007" and your downstream type system silently drops the leading zero. The tool exists so you can throw a blob at it, see the other format immediately, and copy it back — without involving a server, a paste site, or a script you have to remember the shape of.

A short history of CSV

Comma-separated values predate the personal computer. Mainframe report exports in the late 1960s and early 1970s used line-oriented, delimiter-separated records as a matter of course; IBM's FORTRAN list-directed I/O (the READ *, A, B, C form, which parses whitespace- or comma-separated values into variables) is the earliest broadly available implementation that looks recognisably like modern CSV.

The format then lived for three decades without a written standard. Every spreadsheet program supported it. Every database engine offered COPY or its equivalent. Every shell script could munge it with awk -F,. And every implementation handled embedded commas, quotes, and newlines differently — sometimes incompatibly. The result was the endemic situation of "CSV that opens in Excel" being a different format from "CSV that round-trips through Postgres" being a different format from "CSV the data team emails you".

In 2005, Yakov Shafranovich published RFC 4180 — an Informational document that codified the dialect that most implementations were almost following. It was never made a binding standard, but it gave the community a written reference to point at when arguing about edge cases, and most modern parsers (including the one inside this tool) target it as the "canonical" CSV.[1]

What RFC 4180 actually says

The document is short — ten pages — and the substance is even shorter. Fields are separated by a comma (,). Records are separated by CRLF (\r\n). Fields containing the delimiter, the quote character, or a line break must be enclosed in double quotes ("). An interior double quote is escaped by doubling it (""). A header row is optional; if present, it sits on the first line and names the columns. UTF-8 is not specified — the RFC predates UTF-8's dominance and is technically agnostic about encoding.

This tool follows RFC 4180 when emitting CSV: CRLF line endings on the canonical output, double-quote quoting, doubled-quote escapes, configurable header row. On input it accepts considerably more — LF or CR-only line endings, a leading UTF-8 BOM, any of four common delimiters — because that's what real CSV in the wild looks like.

Where CSV dialects diverge in practice

The four places real CSV stops being RFC 4180 and starts being local custom:

  • Delimiter. The literalC in CSV is a polite fiction. Continental European locales use the semicolon (;) because the comma is a decimal separator. Database \copy exports often default to a tab (\t) and proudly call themselves TSV. The pipe (|) appears in financial and telecom data because it almost never occurs inside the values themselves. This tool's auto-detect inspects the first ten lines and votes among those four.
  • Line endings. RFC 4180 says CRLF. Unix exporters write LF. Classic Mac OS wrote bare CR. The parser accepts all three and normalises.
  • The leading UTF-8 BOM. Excel writes a UTF-8 BOM (EF BB BF) at the front of every "CSV UTF-8" export so older versions of Excel can re-open the file without mangling accents. The BOM then surprises naive parsers, which read the first three bytes as part of the first cell's first column name and emit{ "\ufeffname": ... } instead of { "name": ... }. This tool strips the BOM before parsing.
  • Escape conventions. Some non-RFC dialects use backslash-escapes (\, or \") instead of doubled-quote escapes. The tool follows RFC 4180 and does not interpret backslash escapes; if you have a source that uses them, preprocess separately.

What JSON is, in the same lens

JSON's lineage starts with Douglas Crockford in the early 2000s. He needed a compact, JavaScript-native way to send messages between an Internet Explorer client and a server, noticed that JavaScript's object literal syntax was already a serviceable interchange format, stripped it down to two structures (object and array) and four scalar types, and published it at json.org in 2001.

The IETF picked it up as RFC 4627, refined it through 7158 and 7159, and froze the current grammar in RFC 8259 in December 2017. The interchange grammar requires UTF-8 on the wire. The top-level value can be any JSON value — object, array, string, number, boolean, or null — which is why this tool refuses to convert a bare object or a bare number into CSV: there's no rows-and-columns interpretation available.[2]

JSON is a tree, not a table. Converting it to CSV therefore requires picking a shape: which arrays are rows, which fields are columns, what happens to nested objects. The mapping this tool uses is the conventional one for tabular use, and the cases it intentionally won't handle (deeply nested flattening, schema inference) are explicitly out of scope.

The mapping this tool uses

  • CSV → JSON, header row on: array of objects. Keys are taken from the header row. Surplus cells in a row become __extra_1, __extra_2, … so no data is silently dropped.
  • CSV → JSON, header row off: array of arrays. Each inner array is a row of cell values.
  • JSON → CSV, array of objects: the header is the union of keys across all rows, in first-seen order. Missing keys emit empty cells.
  • JSON → CSV, array of arrays: no inferred header. With Headers row on, the tool emits a synthetic col_1, col_2, … row.
  • Nested cells: objects and arrays inside a cell are stringified as compact JSON and CSV-quoted. There is no a.b.c flattening — that's a different tool with a different contract.

The dynamic-typing question

When a CSV cell contains 42, is that the number forty-two or the string "42"? CSV has no way to say. Excel guesses, and famously guesses badly: gene names like SEPT2 and MARCH1 get autoconverted into dates, an error documented in the biomedical literature for decades.

This tool defaults Dynamic typing off— values stay strings unless you explicitly turn it on — to keep the round-trip safe by default. When you do turn it on, the regex that decides "this looks like a number" requires no leading zeros (except the literal 0). The string "007" stays a string, because that's almost always a postal code, SKU, phone number, or other zero-padded identifier the upstream system depended on. The literal strings true, false, and null (case-insensitive) also typecast under dynamic typing; everything else stays a string.

The library doing the work

The CSV side runs on Papa Parse, the de facto CSV library for JavaScript: MIT-licensed, roughly seven million weekly downloads on NPM, pure JS with no native bindings, and the parser everyone reaches for when they want auto-detected delimiters and correct quoted-field handling out of the box.[3]The JSON side is the browser's native JSON.parse and JSON.stringify — no library, no transitive dependencies.

Both halves run on the main thread, on inputs up to 5 MB. We wrap the parse step in React'suseDeferredValue so a multi-megabyte input doesn't freeze the keystroke loop, and surface a parsing indicator while a fresh keystroke is in flight ahead of the parser.

Common anti-patterns

  • Pasting an Excel cell with a hidden tab character inside it and then wondering why the row split into the wrong number of columns.
  • Trusting the header row when the upstream system writes it sporadically: one batch has id,name,email on line one, the next batch starts straight into rows.
  • Trying to flatten arbitrarily deep JSON into a flat CSV. Tabular formats can't represent recursion; pick a projection (a key path) and run that projection upstream.
  • Choosing the comma as a delimiter for data that contains commas, without quoting. The fix is either quoting (RFC 4180) or a different delimiter (tab, pipe).
  • Treating "CSV" as one thing. It isn't. When two systems disagree on a CSV, the negotiation point is always delimiter + quoting + line endings + encoding.

Privacy posture

Everything on this page runs in your browser. There is no upload, no API call, no server log. The text you paste does not leave the tab. When you close the tab it's gone — no local storage, no history. Open DevTools and watch the Network tab while you type if you want to verify it: the traffic you'll see is the page load, the fonts, and nothing else.

References

  1. Shafranovich, Y. RFC 4180 — Common Format and MIME Type for Comma-Separated Values (CSV) Files. Internet Engineering Task Force, October 2005. datatracker.ietf.org/doc/html/rfc4180.
  2. Bray, T. RFC 8259 — The JavaScript Object Notation (JSON) Data Interchange Format. Internet Engineering Task Force, December 2017. datatracker.ietf.org/doc/html/rfc8259.
  3. Holt, M. Papa Parse — Documentation. papaparse.com/docs.