A Practical Regex Cheat Sheet: Patterns You'll Actually Use
· by Andergrove Software
Most regex cheat sheets are dry tables of tokens. This one is task-oriented: the dozen building blocks that cover the vast majority of real patterns, the greedy-versus-lazy gotcha that wastes the most time, and a set of copy-paste recipes for the things you actually match (emails, URLs, dates, slugs). Test every pattern as you go in the regex tester, which highlights matches and capture groups live in your browser.
The building blocks (90% of regex)
Learn these and you can read and write most patterns:
^and$— anchors for the start and end of the string (or line, with themflag)..— any single character except a newline.\d\w\s— a digit, a word character ([A-Za-z0-9_]), or whitespace. Their uppercase forms\D\W\Smean the opposite.[abc]— a character set (any one of a, b, c).[^abc]negates it;[a-z]is a range.*+?— quantifiers: zero-or-more, one-or-more, zero-or-one.{n}{n,}{n,m}— exact, at-least, and bounded counts.(...)— a capture group.(?:...)groups without capturing.a|b— alternation: match a or b.\b— a word boundary (the edge between a word and a non-word character).\— escape, to match a metacharacter literally.
That is the whole alphabet for most day-to-day work. Everything below is built from it.
Greedy vs. lazy: the time-waster
By default, quantifiers are greedy: they match as much as possible. Add a
? and they become lazy, matching as little as possible. The
classic trap is matching text between delimiters. On the input "a" and "b":
".*" matches "a" and "b" (greedy: the whole span)
".*?" matches "a" (lazy: the first pair)
If a pattern is "grabbing too much," a lazy quantifier is usually the fix. It is the single most common regex bug.
Capture groups, named groups, backreferences
Groups pull pieces out of a match:
(\d{4})-(\d{2})-(\d{2}) group 1 = year, 2 = month, 3 = day
(?<year>\d{4})-(?<month>\d{2}) named groups: refer to them by name
(\w)\1 a backreference: matches a doubled character
Named groups make replacements readable: $<year> in a replacement string,
or \k<year> to match the same text again later in the pattern. Reach for
non-capturing (?:...) when you only need grouping for a quantifier or alternation,
not extraction.
A copy-paste recipe book
Pragmatic patterns for common jobs. Treat the email and URL ones as "good enough for a form field," not RFC-perfect:
Email-ish ^[^\s@]+@[^\s@]+\.[^\s@]+$
URL https?://[^\s/$.?#]\S*
ISO date (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Hex colour #(?:[0-9a-fA-F]{3}){1,2}\b
URL slug [a-z0-9]+(?:-[a-z0-9]+)*
Collapse ws \s+ (replace with a single space, then trim)
The whitespace one is the quiet workhorse: replacing \s+ with a single space
cleans up text pasted from PDFs and editors in one shot.
Pitfalls that bite
- Catastrophic backtracking (ReDoS). Nested quantifiers over overlapping patterns, such as
(a+)+$, can take exponential time on a long string that almost matches, freezing your program. Avoid nesting quantifiers on the same characters; prefer specific character classes, and test with deliberately nasty input. - Forgetting to escape. To match a literal dot use
\.; a bare.matches any character. The metacharacters that need escaping are. * + ? ( ) [ ] { } ^ $ | \. - The wrong flags.
gmatches every occurrence,iignores case,mmakes^and$match per line, ands(dotall) lets.match newlines. A pattern that "works on one line but not the whole file" is usually missingmors.
Build and test it live
Regex is a write-once, debug-forever language, so test as you write. Paste a pattern and sample text into the Andergrove Regex Tester to see matches and capture groups highlighted instantly, entirely in your browser. Start from a recipe above, then tweak it against your real data. For plain text cleanup without regex, the word counter and case converter handle the common cases in one click.