Unicode Inspector
Reveal what’s really in a string: invisible characters, zero-width spaces, bidirectional controls, unusual spaces and lookalike letters — each named, counted and highlighted in place, with a cleaned copy one click away. Everything runs locally in your browser.
New to this? Read the Unicode Inspector guide →
Paste text above to inspect it.
Highlighted view
Cleaned copy
Everything runs locally in your browser — nothing you paste is uploaded.
How to use the Unicode inspector
Paste any text — code from a pull request, a domain name from an email, a username, a config value that "looks right but doesn't work" — and the inspector examines every code point. The summary chips count what it found, the highlighted view shows exactly where each hidden character sits (as small labelled chips inline with your text), and the table lists every flagged character with its official Unicode name and count. Tick the cleaning options you want and copy a repaired version out.
What it flags
Invisible characters — zero-width spaces, joiners, the BOM, soft hyphens: characters that render as nothing but break string comparison, search and diffs. Bidirectional controls — the right-to-left override family that can make source code display differently from how a compiler reads it (the "Trojan Source" attack). Unusual spaces — the sixteen-plus Unicode spaces that look like a space but aren't the one your parser splits on, including the non-breaking space that Word and web pages love to paste. Control characters — C0/C1 codes that shouldn't appear in normal text. Lookalike letters — a curated set of Cyrillic and Greek homoglyphs plus fullwidth forms, the raw material of spoofed domains and impersonation usernames.
Where hidden characters come from
Almost never from malice. Word processors autocorrect straight quotes and hyphenate with soft hyphens; web pages pad text with non-breaking spaces; chat apps and AI assistants sometimes emit zero-width characters; copying from a PDF drags in whatever the typesetter used. The result lands in your codebase, your spreadsheet or your password field, where "identical" strings refuse to match. The malicious cases — bidi tricks in code review, homoglyph domains in phishing — are rarer but far more costly, which is why this tool treats both with the same seriousness.
Reading the results like a reviewer
A bidi control in source code is a stop-everything finding: legitimate uses in code are vanishingly rare, and the Trojan Source paper showed how an override can make // Check if admin and executable logic swap places visually. A lookalike letter in a domain or username means the string is not the brand it resembles — paste any suspicious link here before trusting it. Zero-width characters and odd spaces in code or data are usually accidents, but accidents that cost hours: they're why the "same" API key fails, the CSV column won't parse as a number, and the two visually identical lines diff as different.
Frequently asked questions
What does the Unicode inspector detect?
Invisible characters (zero-width spaces, joiners, the BOM, soft hyphens), bidirectional control characters, unusual Unicode spaces, C0/C1 control codes, and a curated set of Cyrillic/Greek lookalike letters plus fullwidth forms. Each is named, counted and highlighted in place.
Why do two identical-looking strings not match?
Usually an invisible character (often a zero-width space or non-breaking space picked up from a web page or Word document) or a lookalike letter from another alphabet. Paste both strings into the inspector and the difference is highlighted immediately.
What is a Trojan Source attack?
A technique (CVE-2021-42574) that uses bidirectional control characters to make source code display in a different order than compilers read it, hiding malicious logic from human review. Any bidi control character in source code deserves scrutiny; the inspector flags them in red.
Is my text uploaded?
No. All analysis and cleaning runs locally in your browser — nothing you paste leaves your machine, which matters because the suspicious string is sometimes a password or an API key.