\Pluf_Text_UTF8

UTF8 helper functions

Original file coming from DokuWiki. Updated as we consider that the multibytes functions are always available and wrapped in a class.

Summary

Methods
Properties
Constants
filename()
is_ascii()
strip()
check()
detect_cyr_charset()
deaccent()
romanize()
stripspecials()
accents()
special_chars()
romanization()
No public properties found
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Methods

filename()

filename(  $file,   $safe = true) 

URL-Encode a filename/URL to allow unicodecharacters

Slashes are not encoded

When the second parameter is true the string will be encoded only if non ASCII characters are detected - This makes it safe to run it multiple times on the same string (default is true)

Parameters

$file
$safe

is_ascii()

is_ascii(  $str) 

Checks if a string contains 7bit ASCII only

Parameters

$str

strip()

strip(  $str) 

Strips all highbyte chars

Returns a pure ASCII7 string

Parameters

$str

check()

check(  $str) 

Tries to detect if a string is in Unicode encoding

Parameters

$str

detect_cyr_charset()

detect_cyr_charset(  $str) : string

Detect if a string is in a Russian charset.

This should be used when the mb_string detection encoding is failing. For example:

$encoding = mb_detect_encoding($string, mb_detect_order(), true);
if ($encoding == false) {
    $encoding = Pluf_Text_UTF8::detect_cyr_charset($string);
}

Parameters

$str

Returns

string —

Possible Russian encoding

deaccent()

deaccent(  $string) 

Replace accented UTF-8 characters by unaccented ASCII-7 equivalents.

Parameters

$string

romanize()

romanize(  $string) 

Romanize a non-latin string

FIXME

Parameters

$string

stripspecials()

stripspecials(string  $string, string  $repl = '', string  $additional = '') 

Removes special characters (nonalphanumeric) from a UTF-8 string

This function adds the controlchars 0x00 to 0x19 to the array of stripped chars (they are not included in $UTF8_SPECIAL_CHARS)

Parameters

string $string

The UTF8 string to strip of special chars

string $repl

Replace special with this string

string $additional

Additional chars to strip (used in regexp char class)

accents()

accents() 

UTF-8 lookup table for lower case accented letters

This lookuptable defines replacements for accented characters from the ASCII-7 range. This are lower case letters only.

special_chars()

special_chars() 

romanization()

romanization() 

Romanization lookup table

This lookup tables provides a way to transform strings written in a language different from the ones based upon latin letters into plain ASCII.

Please note: this is not a scientific transliteration table. It only works oneway from nonlatin to ASCII and it works by simple character replacement only. Specialities of each language are not supported.