Project Unilight helps you reveal and clean hidden Unicode characters — zero-width spaces,
soft hyphens, direction controls, combining marks, and look-alike letters. These characters
don’t always show on screen but can break code, change meaning, or cause security issues.
The demo below shows the same paragraph twice: a “Clean” version and a “Tainted” version
with hidden characters sprinkled in. Use the legend to see which category each highlight
represents.
Paste text into the input box or upload a file to analyze. The output panel shows the same text with hidden or risky characters highlighted and counted. Use the legend chips to see which categories are present.
Choose how you want to clean the text. You can strip emoji, control codes, soft hyphens, or enforce ASCII-only output. Preview the cleaned result below, then copy or download it — all processing happens in your browser.
Choose what to remove/normalize, then preview. Defaults are conservative:
Cleanup is client-side. “ASCII-only” applies NFKD normalization first (to decompose accents), then strips anything outside U+0000–U+007F. Line feeds and tabs are preserved; other controls are removed. BiDi controls include LRM/RLM and the embedding/override isolates (U+202A–U+202E, U+2066–U+2069).