\Pluf_Text_Lang

Detect the language of a text.

list($lang, $confid) = Pluf_Text_Lang::detect($string);

Summary

Methods
Properties
Constants
detect()
docNgrams()
makeNgrams()
ngramDistance()
No public properties found
No constants found
No protected methods found
No protected properties found
N/A
No private methods found
No private properties found
N/A

Methods

detect()

detect(  $string,   $is_clean = false) : array

Given a string, returns the language.

Algorithm by Cavnar et al. 94.

Parameters

$string
$is_clean

Returns

array —

Language, Confidence

docNgrams()

docNgrams(  $string,   $n = 3) : array

Returns the sorted n-grams of a document.

FIXME: We should detect the proportion of thai/chinese/japanese characters and switch to unigram instead of n-grams if the proportion is greater than 50%.

Parameters

$string
$n

Returns

array —

N-Grams

makeNgrams()

makeNgrams(  $word,   $n = 3) : array

Returns the n-grams of rank n of the word.

Parameters

$word
$n

Returns

array —

N-grams

ngramDistance()

ngramDistance(  $n1,   $n2) : integer

Return the distance between two document ngrams.

Parameters

$n1
$n2

Returns

integer —

distance