1. What is a regular expression?
A regular expression — a regex — is a small language for describing patterns over strings. You write a pattern such as \d{3}-\d{4} and a regex engine answers two related questions about any text you hand it: does the pattern match at all, and if so, where? Out of those two answers fall almost every text- processing task a working developer ever runs into: validate an input, extract a token, replace a fragment, split a record, count occurrences.
The lineage runs through Stephen Kleene’s 1951 formalization of regular events as a way to describe the languages recognized by finite automata, and through Ken Thompson’s 1968 implementation in the QED editor and soon after in grep. From there, regular expressions became a fixture of every serious text editor, every Unix utility, every scripting language. The flavor in your browser today is a long way from Kleene’s mathematics — modern regex engines support features (backreferences, lookarounds, named captures) that put them strictly outside the regular languages of formal language theory — but the basic promise is the same: a tiny pattern language, paired with a fast matcher.
2. JavaScript’s RegExp specifically
This tool runs your pattern through the browser-native RegExp constructor: the same engine your Node, Bun, Chrome, Firefox, and Safari runtimes ship. The semantics are specified in the ECMAScript language standard, currently ECMA-262 [2], with the matching algorithms living in the RegExp (Regular Expression) Objects chapter. The practical surface — flags, group references, replacement DSL, the matchAll behavior — is documented end-to-end on MDN[1].
By default this tester compiles patterns with two flags already on: g (global, so every match is returned, not just the first) and u (full Unicode mode, so surrogate pairs and Unicode property escapes work the way you expect). The other four flags are under your control. Defaults are deliberate: virtually every real-world JS regex you write in 2026 wants both, and forgetting them is a common source of the “why does only the first match get replaced?” question.
3. The six flags, explained
JavaScript’s RegExp accepts six standard flags. They are toggled independently and the engine does not care about their order.
g— global. Without it,String.prototype.matchAllthrows, andreplaceAllrefuses to run. With it, every non-overlapping match is returned andlastIndexadvances after each one.i— case-insensitive./Hello/imatcheshello,HELLO, andHeLLo. Folding follows Unicode case mapping rules; ASCII-only callers rarely notice the distinction, but it shows up with Turkish dotted I and Greek sigma.m— multiline. Makes^and$match at line boundaries inside the string, not just at the absolute start and end. Useful for anchoring per-line patterns against log files.s— dotall. Makes.match newline characters too. Easy to forget; common cause of “why doesn’t my regex match across lines?”.u— Unicode. Treats the pattern as a sequence of Unicode code points, so\u{1F600}works and surrogate pairs are handled. Required for\p{…}Unicode property escapes.y— sticky. Each match must start exactly atlastIndex; useful when you are writing a tokenizer and want the engine to walk forward only on successful matches. Most everyday regex never reaches for it.
4. Capture groups, numbered and named
Parentheses serve two roles in a regex pattern. They group alternatives — (http|https) — and they capture: the substring matched by what is inside the parens is stashed so you can refer to it later. The capturing form is (…); the non-capturing form is (?:…); and the named form is (?<name>…).
Numbered groups appear in match[1], match[2] and so on, indexed by the order their opening parens appear in the pattern — nesting does not change that order. Named groups additionally appear in match.groups.<name>. Optional groups (those wrapped in a ?) can be present in some iterations and absent in others; JavaScript reports the absent case as undefined, which this tool renders as ∅ in the capture-groups list so you can see at a glance which alternative the matcher actually took.
5. The replacement-string DSL
The Replace panel runs your output through String.prototype.replaceAll when the g flag is on and replace otherwise. The replacement string is not a regex itself — it is a small mini-language for referring back to what was matched. The tokens you actually need on a daily basis: $& for the full match, $1–$9 for numbered groups, $<name> for named groups, $$ for a literal dollar sign, and $` / $' for the text before and after the match respectively. Unrecognized references such as $<notagroup> are not an error — the engine simply substitutes the literal text, which is occasionally surprising the first time you see it.
6. Catastrophic backtracking and ReDoS
JavaScript’s RegExp implementation, like Perl’s and Python’s re, is a backtracking engine. When a pattern fails to match at some position, the engine rewinds and tries alternative paths through the pattern. For most patterns this is fine: each character contributes a small, bounded amount of backtracking. For a small but load-bearing class of patterns — typically nested quantifiers over overlapping character classes, such as (a+)+, (a|a)+, or alternation under a lookahead — the number of paths the engine explores grows exponentially in the length of the input. Russ Cox’s “Regular Expression Matching Can Be Simple And Fast”[3] is the canonical read on why backtracking engines are structurally vulnerable to this and why grep and RE2 are not.
The failure mode in the wild is ReDoS — regular-expression denial-of-service. A user supplies an innocuous-looking input that pushes a known-bad pattern into the exponential corner of its state space, and a request thread spins for seconds or minutes on what looks like a one-line validation. The mitigation in code is to avoid nested quantifiers on overlapping character classes, prefer atomic alternatives, and validate untrusted regex before running it. The mitigation in this tester is different: we run every evaluation inside a dedicated Web Worker with a 500ms wall-clock timeout. If the engine has not produced a result by then, the worker is terminated, the UI reports the timeout, and the user is invited to simplify the pattern.
7. Common recipes and their caveats
- Email validation:
^\S+@\S+\.\S+$is good enough for most form validation. The full RFC 5322 grammar runs to hundreds of characters and is rejected even by major email clients; it is rarely the regex you actually want. - URLs:
^https?://catches both schemes. For real URL parsing reach for the browser’sURLconstructor instead — it handles percent-encoding, userinfo, IPv6 literals, and everything else regex pretends does not exist. - Phone numbers: don’t. The format varies by country and the user’s preferred presentation is rarely what your validator expects.
- Whitespace trimming: prefer
.trim()or.trimStart()/.trimEnd(); the regex equivalent is almost always slower and easier to break.
8. When not to use regex
The classic answer is “don’t parse HTML with regex” — and it is correct. HTML, like JSON and every other format with balanced delimiters, is a context-free language. Regular expressions, even with modern extensions, cannot match arbitrary nesting depth. Every “just one quick regex” that grows into a fragile in-house HTML parser ends the same way: a perfectly valid edge case breaks production. Use a real parser when you have one.
The broader rule: if your pattern starts encoding the grammar of the data instead of a shape inside the data, you are reaching past where regex is the right tool. JSON has JSON.parse. URLs have URL. CSV has a parser that handles quoting. Dates have Date. Markdown has a real renderer. Use regex for what it is good at: finding, extracting, and rewriting shapes inside text that is already mostly in the form you want.
References
- MDN Web Docs. RegExp — JavaScript reference. Canonical browser-side reference for flag definitions, group references, replacement-string semantics, and
matchAllbehavior. - TC39 / Ecma International. ECMA-262, 14th edition. RegExp (Regular Expression) Objects. The language-standard specification for the regex grammar JavaScript implements, including
u-mode lexical rules, named-group semantics, andlastIndexadvancement. - Cox, R. (2007). Regular Expression Matching Can Be Simple And Fast. The reference on why backtracking engines such as JavaScript’s suffer from catastrophic backtracking and why automaton-based engines such as RE2 do not.