Project Unilight — Unicode Highlighter & Cleaner

Project Unilight helps you reveal and clean hidden Unicode characters — zero-width spaces, soft hyphens, direction controls, combining marks, and look-alike letters. These characters don’t always show on screen but can break code, change meaning, or cause security issues.
The demo below shows the same paragraph twice: a “Clean” version and a “Tainted” version with hidden characters sprinkled in. Use the legend to see which category each highlight represents.

Unicode Visibility Demo — Clean vs Tainted

Non-ASCII
Emoji
RTL
Combining
Control

Clean paragraph (ASCII-only)

No hidden characters expected.

Tainted paragraph (hidden Unicode)

Paste into your tool to clean.

Unicode Visibility Cleaner

Paste text into the input box or upload a file to analyze. The output panel shows the same text with hidden or risky characters highlighted and counted. Use the legend chips to see which categories are present.

Input (paste text or upload a file)

Output (highlighted & annotated)

Non-ASCII 0
Emoji 0
RTL 0
Combining 0
Control 0

Unicode Visibility Cleaner

Choose how you want to clean the text. You can strip emoji, control codes, soft hyphens, or enforce ASCII-only output. Preview the cleaned result below, then copy or download it — all processing happens in your browser.

Cleanup & Export

Choose what to remove/normalize, then preview. Defaults are conservative:

Notes

Cleanup is client-side. “ASCII-only” applies NFKD normalization first (to decompose accents), then strips anything outside U+0000–U+007F. Line feeds and tabs are preserved; other controls are removed. BiDi controls include LRM/RLM and the embedding/override isolates (U+202A–U+202E, U+2066–U+2069).