Project Unilight — Unicode Highlighter & Cleaner

Project Unilight helps you reveal and clean hidden Unicode characters — zero-width spaces, soft hyphens, direction controls, combining marks, and look-alike letters. These characters don’t always show on screen but can break code, change meaning, or cause security issues.
The demo below shows the same paragraph twice: a “Clean” version and a “Tainted” version with hidden characters sprinkled in. Use the legend to see which category each highlight represents.

Unicode Visibility Demo — Clean vs Tainted

Non-ASCII

Emoji

RTL

Combining

Control

Clean paragraph (ASCII-only)

—

No hidden characters expected.

Tainted paragraph (hidden Unicode)

—

Paste into your tool to clean.

Unicode Visibility Cleaner

Paste text into the input box or upload a file to analyze. The output panel shows the same text with hidden or risky characters highlighted and counted. Use the legend chips to see which categories are present.

Unicode Visibility Cleaner

Choose how you want to clean the text. You can strip emoji, control codes, soft hyphens, or enforce ASCII-only output. Preview the cleaned result below, then copy or download it — all processing happens in your browser.

Cleanup & Export

Choose what to remove/normalize, then preview. Defaults are conservative:

Remove emoji Remove control/format Remove combining marks Remove BiDi controls ASCII-only output

Notes

Cleanup is client-side. “ASCII-only” applies NFKD normalization first (to decompose accents), then strips anything outside U+0000–U+007F. Line feeds and tabs are preserved; other controls are removed. BiDi controls include LRM/RLM and the embedding/override isolates (U+202A–U+202E, U+2066–U+2069).

Unicode Visibility Demo — Clean vs Tainted

Clean paragraph (ASCII-only)

Tainted paragraph (hidden Unicode)

Unicode Visibility Cleaner

Input (paste text or upload a file)

Output (highlighted & annotated)

Unicode Visibility Cleaner

Cleanup & Export

Notes