Elofyn Tools · About

About the Markdown ↔ HTML converter

← Back to the tool

What this tool does

The Markdown ↔ HTML converter turns a Markdown document into the equivalent HTML, or an HTML document into the equivalent Markdown, live in your browser tab. You paste source on one side, the other format appears on the other, and you copy or download the result. The conversion runs as you type — debounced so a long document keeps the textarea smooth — and the entire pipeline ships as a code-split chunk so the first page paint stays cheap. There is no server hop, no upload step, no account, and no history written to anything more permanent than your tab's memory.

The forward direction (Markdown → HTML) is what most people are looking for. The reverse (HTML → Markdown) is the one that has historically been harder to find a good tool for, because HTML is strictly more expressive than Markdown — so the round-trip is intentionally lossy by design. We'll explain exactly which features survive intact, which get rewritten, and which get dropped with a clear annotation rather than a silent change.

A short history of Markdown

Markdown was designed in March 2004 by John Gruber, with significant input from Aaron Swartz. Gruber's motivation, captured in the first paragraph of the original Daring Fireball spec, was to let people write structured text that read as easily as email but compiled cleanly into valid HTML. The reference implementation was a single Perl script, Markdown.pl, that used regular-expression substitutions over the source string. That script was the only normative spec for almost a decade.

The looseness of the original definition turned into a problem as Markdown spread. Stack Overflow shipped PageDown. Reddit forked its own variant. GitHub added autolinks, fenced code blocks, and tables. Every implementation handled emphasis intersections, list indentation, and HTML-in-Markdown edge cases differently, so the same source string would produce subtly different HTML in different engines. Long-form documents stopped being portable.

The fix arrived in September 2014 when John MacFarlane, with the backing of Jeff Atwood, David Greenspan, Vicent Marti, Neil Williams, and Benjamin Dumke-von der Ehe, published the first draft of CommonMark[1] — a precise grammar with a comprehensive test suite. CommonMark resolves every ambiguity in the original Gruber spec by giving each construct a fixed parsing rule, and ships a reference implementation in C and JavaScript whose output is the authoritative answer. The current release at the time of writing is CommonMark 0.31.2, published in January 2024.

GitHub then formalised the popular extensions it had been shipping for years as GitHub Flavored Markdown[2], written as a strict superset of CommonMark. GFM adds four constructs on top of CommonMark: tables, task list items, strikethrough, and autolinked URLs. The GFM spec is itself defined as a diff against the CommonMark grammar, which is why every modern Markdown parser can implement both with the same code path. This converter ships with GFM enabled by default because that is what the overwhelming majority of users expect, and it can be turned off for strict-CommonMark output via the More options panel.

How the conversion actually works

The conversion is an AST transformation, not a regex pass. Source text becomes a syntax tree, the syntax tree is rewritten into the target language's syntax tree, and the target tree is serialised. The technique is the same one a compiler uses to lower from one intermediate representation to another, and it is the reason a modern Markdown engine handles edge cases the original regex-driven Markdown.pl never could.

We use the unified[3] ecosystem for both directions. unified is the umbrella project; remark is its Markdown-AST library and rehype is its HTML-AST library. The relevant abstract syntax trees are standard, well-documented, and shared across hundreds of plugins: MDAST for Markdown and HAST for HTML.

The forward (Markdown → HTML) pipeline is remark-parse remark-rehype rehype-stringify. That is the canonical four-stage path described on unifiedjs.com: source string into MDAST, MDAST mapped into HAST, HAST serialised into an HTML string. GFM, when enabled, slots in at the MDAST level via the remark-gfm plugin so the tables, strikethrough, task lists, and autolinks all become first-class MDAST nodes before the bridge to HAST runs.

The reverse (HTML → Markdown) pipeline is the mirror image: rehype-parse rehype-remark remark-stringify. The HTML string becomes HAST, HAST is mapped down into MDAST, and the MDAST tree is serialised as Markdown. GFM in this direction is ordered after rehype-remark, because the GFM plugin only operates on the MDAST side — adding it earlier would do nothing. With GFM on, a <del> tag in the HTML input becomes a ~~strikethrough~~ in the Markdown output instead of an HTML pass-through.

Raw HTML inside Markdown

CommonMark explicitly permits HTML blocks and inline HTML inside a Markdown document. A Markdown source can therefore contain a <details>accordion, an <sub> subscript, or a hand-written anchor with custom attributes, and a spec-compliant renderer must pass that markup through to the output untouched.

The unified pipeline ships safely by default — raw HTML in the input is escaped rather than emitted — and the Allow raw HTML in Markdown toggle in the converter is the opt-in. Under the hood the toggle sets two flags: allowDangerousHtml: true on the remark-rehype bridge (so MDAST HTML nodes survive into HAST) and the same flag on rehype-stringify (so HAST raw-HTML nodes get emitted instead of being filtered out). We expose it as a single switch because turning on only one of the two is always a bug.

This is not a sanitizer. If your source contains a <script src="https://attacker">tag and the toggle is on, that exact string ends up in the HTML output. The rendered preview view runs the output inside a sandboxed iframe whose sandbox=""attribute carries the empty token set, which means no scripts run, no plugins load, no top-level navigation is allowed, and the iframe has a null origin — so even an embedded image with an attacker URL cannot reach elofyn.com cookies or local storage. If you need a sanitizer for downstream use, drop the output into DOMPurifyor an equivalent allowlist-based library. Don't rely on this tool to do that job.

Why the round-trip is lossy

HTML has more expressive headroom than Markdown by design. Inline CSS, class names, ID attributes, data attributes, ARIA hooks, presentational tags like <font>, and custom elements all have first-class HTML syntax. Markdown has syntax for headings, paragraphs, lists, code blocks, blockquotes, inline emphasis, links, images, horizontal rules, and — with GFM — tables, strikethrough, and task lists. Anything outside that vocabulary cannot be expressed in Markdown source.

The HTML → Markdown direction therefore drops information by construction. Inline styles disappear. Classes and IDs vanish. Custom attributes are stripped. A <div class="hero"><p>…</p></div> becomes a plain paragraph in the Markdown output. The semantic structure — what is a heading, what is a list, what links to what — survives intact, which is exactly the property you want when cleaning up CMS HTML, but the visual fidelity does not.

Even a Markdown → HTML → Markdown round-trip is not a no-op. The remark-stringifyserialiser normalises every cosmetic choice in the source. The underscore form of emphasis becomes the asterisk form, reference links may collapse to inline links, whitespace inside lists is regularised, and bullet markers settle on a single shape. The resulting Markdown has the same meaning as the original, and a CommonMark renderer produces the same HTML from either copy, but a character-level diff will not be empty. That is a property of the format, not a bug in the converter, and any tool that claims otherwise is hiding a parse step.

Common use cases

  • Pulling the body of a GitHub PR description out of the HTML rendered preview and back into a portable .md file for your docs repo.
  • Migrating long-form blog posts out of a WYSIWYG CMS into a static-site generator (Hugo, Astro, Next.js MDX) that consumes Markdown.
  • Cleaning HTML produced by an LLM back into Markdown so it can be reviewed in a text-only diff and committed to a content repository.
  • Sanity-checking how a README will render before pushing to the origin. The rendered-preview tab shows the output in the same sandboxed surface a browser would use.
  • Embedding a small HTML snippet inside a Markdown document by toggling the raw-HTML pass-through on, converting once, then pasting the result.

Anti-patterns

  • Treating this tool as an HTML sanitizer. The raw-HTML toggle exists to pass markup through verbatim — it does not strip scripts or normalise attributes. For untrusted input, run the output through DOMPurify or an allowlist-based sanitizer downstream.
  • Expecting bit-for-bit round-tripping. CommonMark serialisation is canonical, not literal. The semantic structure round-trips; the surface syntax does not.
  • Pasting secrets. The tool stays on-device by design, but the input still touches browser memory, clipboard history, and any extensions that can read the page. Treat the textarea like the rest of your browser.
  • Trying to use this tool as a Markdown linter or HTML validator. Both parsers are lenient by design: malformed input renders best-effort instead of erroring out, which is the right call for a converter and the wrong one for a linter.

How to choose the options

The two switches in the More options panel cover the ninety-five percent of cases where you might want behaviour other than the default.

GitHub Flavored Markdown. Leave this on if your source or target is GitHub, GitLab, Forgejo, Obsidian, or any modern Markdown engine. Turn it off when you specifically need strict CommonMark output — for example, compiling content for a vintage Markdown engine, or shipping source through a publishing pipeline that runs its own GFM-incompatible parser. With GFM off the converter will not emit tables, strikethrough syntax, or task-list checkboxes; in the HTML → Markdown direction those constructs degrade to inline-HTML pass-throughs.

Allow raw HTML in Markdown. Leave this on when you trust the source and want HTML pass-through — embedded <details>blocks, hand-written anchors, custom callout markup. Turn it off when you're cleaning Markdown that has accumulated junk HTML you want stripped, or when you're ingesting Markdown from a source you do not trust. With the toggle off any raw HTML in the input is escaped to entity references in the output.

References

  1. MacFarlane, J. (Ed.). (2024). CommonMark Spec, version 0.31.2. https://spec.commonmark.org/0.31.2/ — the canonical Markdown grammar, backs every claim about CommonMark parsing rules and the historical motivation for standardisation.
  2. GitHub, Inc. (2024). GitHub Flavored Markdown Spec. https://github.github.com/gfm/ — the GFM grammar written as a diff against CommonMark, backs the GFM-feature list (tables, strikethrough, task lists, autolinks) and the “strict superset of CommonMark” framing.
  3. Wormer, T., and the unified collective. (2024). unified — content as structured data. https://unifiedjs.com/ — the unified / remark / rehype ecosystem used under the hood, backs the AST-transformation explanation and the choice of library.