Elofyn Tools · About

About the Base64 · URL · HTML encoder

← Back to the tool

What encoding is, and why this tool has three tabs

Encoding is the act of rewriting a string into a form that some specific transport will accept without changing the meaning. The three formats this tool covers — Base64, URL percent-encoding, and HTML entities — were each invented to solve the same shape of problem in three different layers of the stack: the bytes layer, the URI layer, and the markup layer. They look superficially alike (an input string becomes a longer, uglier output string) but they exist for different reasons, follow different rules, and fail in different ways. A working developer reaches for them every week, often back to back, often on the same value. That is why this tool ships them in one page rather than three.

The tool runs entirely in your browser. Nothing you paste here ever leaves the tab. The three conversions sit behind a tab control; each tab has an Encode and a Decode direction; each tab exposes the one or two options that matter for the real-world variants of that format. The remaining sections of this page walk through what each format is, when it bites, and how the tool defaults are chosen.

Base64

Base64 is the answer to a question that arrived with MIME in the early 1990s: how do you carry arbitrary 8-bit binary data through a transport — an email body, a header, a database column — that only reliably handles 7-bit printable ASCII? The trick is to take three bytes of input (24 bits) and rewrite them as four characters drawn from a 64-character alphabet (4 × 6 = 24 bits). Every three bytes in becomes four characters out, with = padding making up the difference when the input length is not a multiple of three. The format is specified by RFC 4648[1], which defines two flavours.

The Standard alphabet (§4 of the RFC) uses A–Z, a–z, 0–9, +, and /. The URL-safe alphabet (§5) swaps the last two for - and _ so the encoded value can sit in a URL path or query string without being mistaken for a path separator or a query operator. Both alphabets are otherwise identical; the tool decodes them interchangeably and the toggle affects only the encode direction.

Padding (the trailing =signs) is the format's most argued-about detail. The RFC mandates padding by default but allows implementations to omit it as long as both sides agree. JWT segments, for example, strip padding to keep tokens compact. The tool ships the Without padding toggle for that case, and on the decode side it auto-pads to the next multiple of four before calling atob so a pasted JWT segment decodes cleanly without manually adding = back on.

The one Base64 footgun that bites everyone at least once is the UTF-8 wrapping. The browser's built-in btoa('é') throws an InvalidCharacterError because btoa only accepts Latin-1 bytes, and 'é' is a two-byte UTF-8 sequence. The fix is to run the input through TextEncoder first and feed the resulting bytes into btoa. The tool does this for you. The decode path mirrors it via TextDecoder, so round-tripping any Unicode string is lossless.

Common Base64 use cases: encoding the payload of an HTTP basic auth header (user:pass), inlining a small asset as a data:URI, the three segments of a JSON Web Token, the email attachment encoding MIME still uses today, and serialising bytes into a JSON string field whose schema requires text. It is not encryption — anyone who sees a Base64 string can decode it in one shell command — but it is the universal transport for “this data needs to survive a hop.”

URL percent-encoding

URL percent-encoding is what the browser does invisibly every time you type a query into a search box. The grammar is defined in RFC 3986[2], which partitions the printable ASCII set into unreserved characters (letters, digits, and a handful of safe punctuation: - _ . ~) and reserved characters whose meaning is part of URI syntax (: / ? # [ ] @ plus the sub-delims ! $ & ' ( ) * + , ; =). Any byte outside the unreserved set gets rewritten as %XX, where XX is its hexadecimal UTF-8 byte value.

JavaScript ships two encode primitives that draw the line in different places. encodeURI assumes the input is a whole URL and leaves the reserved characters alone so the structural punctuation keeps working; encodeURIComponent assumes the input is a single field — a query parameter value, a path segment — and aggressively percent-encodes the reserved set too. The tool exposes the choice as the Mode toggle: Componentis the default because it is the right answer for the “I am putting this in a single URL field” case that brings users here in the first place.

The decode side is asymmetric on purpose: there is only one decoder, decodeURIComponent, and the tool uses it for both modes. That is faithful to RFC 3986 and to the browser standard, but it is also the origin of the single most confusing URL bug: + is not space. In theapplication/x-www-form-urlencoded form-submission grammar (a separate WHATWG spec) the + character does mean space, but RFC 3986 keeps + literal. The tool follows RFC 3986. If your input came from a form submission and you need + turned into a space, do that conversion before pasting.

The other classic URL incident is double-encoding. A value gets percent-encoded on the client, then re-encoded by a proxy or a framework on the way through, and the result is a string like hello%2520world %20 for space, then the % itself got encoded to %25. The fix is to decode once, look at the result, decide whether it still looks encoded, and only decode again if it does. The decoder in this tool runs once per click of Decode; you can paste the output back in and run it again to peel off another layer if you need to.

HTML entities

HTML entities solve the markup-injection problem. The five characters that mean something to an HTML parser — &, <, >, ", and ' — have to be rewritten as named or numeric references (&amp;, &lt;, and so on) if you want them to display as text rather than be interpreted as markup. The canonical reference for the named set is the WHATWG HTML Living Standard's named character references table[3], which lists 2,231 entries — every browser ships the same table because the HTML specification mandates it.

The encode direction in this tool only writes the five canonical characters by default. Over-encoding (turning every non-ASCII character into &#nnn;) bloats the output, hurts diff legibility, and is unnecessary on a UTF-8 page. The Entity form toggle lets you pick between Named (&amp;, &lt;) and Numeric (&#38;, &#60;) forms for the five-character set. Numeric form is useful when you are producing markup destined for a parser whose named-entity table is older or smaller than the modern browser's — anything outside the original HTML 2.0 set is safer expressed numerically.

The decode direction is where the safety story matters. There are two ways to decode HTML entities in a browser. The wrong way is to write div.innerHTML = source and read back div.textContent; entities decode, but the browser also parses anything that looks like markup, so a <script> tag executes, and an <img onerror="…"> fires its handler. The right way is the <textarea> round-trip: a <textarea> element's content model is plain character data, not parsed markup, so the browser decodes entities but never executes anything that looks like a tag. That is the path the tool uses; you can decode arbitrarily hostile-looking input here and watch DevTools' Network panel stay completely quiet.

How to choose

A short decision table for the three tabs and the moments you reach for each one:

  • Embedding bytes inside a text field. JSON column, JWT segment, MIME body, basic auth header, inline data: URI — pick Base64. Toggle URL-safe when the encoded value has to fit in a URL.
  • Putting an arbitrary value into a URL field. Query parameter value, path segment, redirect target — pick URL · Component. Switch to Whole URI when you are passing an entire URL that should keep its structural punctuation.
  • Displaying user text inside an HTML document. Always run it through HTML · Encode first. Decoding is for the inverse case — you have a string that already contains &amp; or &#39; and you want the original text back.

Common anti-patterns

The same handful of mistakes show up across every team. Worth naming explicitly so you do not have to debug them at 2 a.m.

  • Using Base64 as encryption. It is an encoding, not a cipher. Anyone who sees the encoded string can decode it instantly. Use a real cipher for secrets and Base64 only for transport.
  • Mixing Base64 alphabets without normalising. A URL-safe encoded string passed to a standard decoder fails on the first - or _. This tool normalises in the decode direction; many libraries do not.
  • Treating URL-decoded + as space. Only valid for form bodies, not for path or query components.
  • Decoding HTML via innerHTML on a live node. XSS in one line. Always go through a sandboxed text container.
  • Double-encoding a value because a framework already did it. Decode once, look, decide.

References

  1. Josefsson, S. (2006). RFC 4648 — The Base16, Base32, and Base64 Data Encodings. Internet Engineering Task Force. https://datatracker.ietf.org/doc/html/rfc4648 — the standard and URL-safe alphabets (§4, §5), padding rules, validation behaviour.
  2. Berners-Lee, T., Fielding, R., & Masinter, L. (2005). RFC 3986 — Uniform Resource Identifier (URI): Generic Syntax. Internet Engineering Task Force, STD 66. https://datatracker.ietf.org/doc/html/rfc3986 — reserved and unreserved character sets, percent-encoding rules, the basis for encodeURI vs encodeURIComponent.
  3. WHATWG. HTML Living Standard — Named character references. https://html.spec.whatwg.org/multipage/named-characters.html — the authoritative list of HTML entity names, numeric reference syntax, browser-mandated decoding behaviour.