Character Analysis
Non-Printable Characters
Section titled “Non-Printable Characters”Detect characters that don’t render visibly:
use Cline\Babel\Babel;
// Null byteBabel::from("Hello\x00World")->containsNonPrintable(); // true
// Bell characterBabel::from("Alert\x07!")->containsNonPrintable(); // true
// Normal textBabel::from('Hello World')->containsNonPrintable(); // false
// Note: tabs and newlines are considered printableBabel::from("Hello\tWorld\n")->containsNonPrintable(); // falseControl Characters
Section titled “Control Characters”Detect ASCII control characters (C0 and C1):
// Null byteBabel::from("Hello\x00World")->containsControlChars(); // true
// Escape characterBabel::from("Hello\x1BWorld")->containsControlChars(); // true
// BellBabel::from("Hello\x07World")->containsControlChars(); // true
// Normal text with whitespaceBabel::from("Hello\nWorld")->containsControlChars(); // true (newline is control)Whitespace Detection
Section titled “Whitespace Detection”Check if string contains only whitespace:
// Spaces onlyBabel::from(' ')->isWhitespace(); // true
// Tabs and newlinesBabel::from("\t\n\r")->isWhitespace(); // true
// Mixed contentBabel::from(' Hello ')->isWhitespace(); // false
// Empty stringBabel::from('')->isWhitespace(); // true
// Unicode whitespaceBabel::from("\u{00A0}")->isWhitespace(); // true (non-breaking space)Invisible Characters
Section titled “Invisible Characters”Detect zero-width and invisible Unicode characters often used for text manipulation:
// Zero-width space (U+200B)Babel::from("Hello\u{200B}World")->containsInvisible(); // true
// Zero-width non-joiner (U+200C)Babel::from("Hello\u{200C}World")->containsInvisible(); // true
// Zero-width joiner (U+200D)Babel::from("Hello\u{200D}World")->containsInvisible(); // true
// Byte order mark (U+FEFF)Babel::from("\u{FEFF}Hello")->containsInvisible(); // true
// Word joiner (U+2060)Babel::from("Hello\u{2060}World")->containsInvisible(); // true
// Normal textBabel::from('Hello World')->containsInvisible(); // falseHomoglyph Detection
Section titled “Homoglyph Detection”Detect characters that look similar to common Latin characters but are from different scripts (potential security issue):
// Cyrillic 'а' looks like Latin 'a'Babel::from('pаypal')->containsHomoglyphs(); // true (Cyrillic а)
// Cyrillic 'о' looks like Latin 'o'Babel::from('gооgle')->containsHomoglyphs(); // true (Cyrillic о)
// Pure LatinBabel::from('paypal')->containsHomoglyphs(); // false
// Pure Cyrillic (not homoglyphs, just Cyrillic)Babel::from('Привет')->containsHomoglyphs(); // falseCommon Homoglyphs
Section titled “Common Homoglyphs”| Latin | Cyrillic | Greek |
|---|---|---|
| a | а (U+0430) | α (U+03B1) |
| c | с (U+0441) | - |
| e | е (U+0435) | ε (U+03B5) |
| o | о (U+043E) | ο (U+03BF) |
| p | р (U+0440) | ρ (U+03C1) |
| x | х (U+0445) | χ (U+03C7) |
Mixed Script Detection
Section titled “Mixed Script Detection”Detect strings containing characters from multiple scripts (potential spoofing/phishing indicator):
// Mixed Latin and CyrillicBabel::from('Hello Привет')->containsMixedScripts(); // true
// Mixed Latin and ChineseBabel::from('Hello 世界')->containsMixedScripts(); // true
// Mixed Latin and ArabicBabel::from('Hello مرحبا')->containsMixedScripts(); // true
// Single script (pure Latin)Babel::from('Hello World')->containsMixedScripts(); // false
// Single script (pure Cyrillic)Babel::from('Привет мир')->containsMixedScripts(); // falseBOM Detection
Section titled “BOM Detection”Check if string starts with a byte-order mark:
// UTF-8 BOMBabel::from("\xEF\xBB\xBFHello")->hasBom(); // true
// UTF-16 BE BOMBabel::from("\xFE\xFFHello")->hasBom(); // true
// UTF-16 LE BOMBabel::from("\xFF\xFEHello")->hasBom(); // true
// No BOMBabel::from('Hello')->hasBom(); // falseString Metrics
Section titled “String Metrics”Get basic string measurements:
$babel = Babel::from('Héllo 世界');
// Character count (not bytes)$babel->length(); // 8
// Byte count$babel->bytes(); // 13 (UTF-8 encoded)
// Check emptiness$babel->isEmpty(); // false$babel->isNotEmpty(); // trueUse Cases
Section titled “Use Cases”Security Validation
Section titled “Security Validation”function isSafeUsername(string $username): bool{ $babel = Babel::from($username);
return !$babel->containsHomoglyphs() && !$babel->containsInvisible() && !$babel->containsControlChars();}Input Sanitization Check
Section titled “Input Sanitization Check”function needsSanitization(string $input): bool{ $babel = Babel::from($input);
return $babel->containsNonPrintable() || $babel->containsInvisible() || $babel->containsControlChars();}Display Validation
Section titled “Display Validation”function isDisplayable(string $text): bool{ $babel = Babel::from($text);
return !$babel->containsNonPrintable() && !$babel->containsControlChars();}Homoglyph Attack Detection
Section titled “Homoglyph Attack Detection”function detectPunycodeThreat(string $domain): bool{ $babel = Babel::from($domain);
// Domain contains mixed scripts with Latin lookalikes return $babel->containsLatin() && $babel->containsHomoglyphs();}
// ExamplesdetectPunycodeThreat('google.com'); // falsedetectPunycodeThreat('gооgle.com'); // true (Cyrillic о)detectPunycodeThreat('pаypal.com'); // true (Cyrillic а)