Unicode::Collate::Locale - 透過 Unicode::Collate 為 DUCET 進行語言調整
use Unicode::Collate::Locale;
#construct
$Collator = Unicode::Collate::Locale->
new(locale => $locale_name, %tailoring);
#sort
@sorted = $Collator->sort(@not_sorted);
#compare
$result = $Collator->cmp($a, $b); # returns 1, 0, or -1.
注意:@not_sorted
、$a
和 $b
中的字串會根據 Perl 的 Unicode 支援進行詮釋。請參閱 perlunicode、perluniintro、perlunitut、perlunifaq、utf8。否則,您可以使用 preprocess
(請參閱 Unicode::Collate
)或在之前對它們進行解碼。
此模組提供語言調整,以利用 Unicode::Collate
。
new
方法會傳回一個整理器物件。
建構函式的參數清單是一個雜湊,其中可以包含一個特殊金鑰 locale
及其值(不分大小寫),代表 Unicode 基礎語言代碼(兩個或三個字母)。例如,Unicode::Collate::Locale->new(locale => 'ES')
會傳回一個針對西班牙語調整的整理器。
$locale_name
可以加上 Unicode 字碼 (四個字母)、Unicode 區域 (領土) 代碼、Unicode 語言變體代碼。這些代碼不分大小寫,並以 '_'
或 '-'
分隔。例如 en_US
代表美國英語、az_Cyrl
代表西里爾字母的亞塞拜然語、es_ES_traditional
代表西班牙的西班牙語 (傳統)。
如果 $locale_name
不可用,則按以下順序選擇後備選項
1. language with a variant code
2. language with a script code
3. language with a region code
4. language
5. default
只要不用於 locale
支援,則允許使用 Unicode::Collate
提供的調整標籤。特別是 table
標籤始終不可調整,因為它保留給 DUCET。
不過,即使 entry
用於 locale
支援,也允許使用 entry
來新增或覆寫對應。
例如,一個忽略變音符號和大小寫差異的西班牙語校對器 (即等級 1),具有反向大小寫順序且沒有正規化。
Unicode::Collate::Locale->new(
level => 1,
locale => 'es',
upper_before_lower => 1,
normalization => undef
)
如果將此類調整傳遞給 new()
,則不允許覆寫已由 locale
調整的行為。
Unicode::Collate::Locale->new(
locale => 'da',
upper_before_lower => 0, # causes error as reserved by 'da'
)
不過,從 Unicode::Collate
繼承的 change()
允許 locale
保留此類調整。範例
new(locale => 'fr_ca')->change(backwards => undef)
new(locale => 'da')->change(upper_before_lower => 0)
new(locale => 'ja')->change(overrideCJK => undef)
Unicode::Collate::Locale
是 Unicode::Collate
的子類別,除了 new
之外的方法都從 Unicode::Collate
繼承而來。
以下是其他方法的清單
$Collator->getlocale
傳回校對中實際接受和使用的語言代碼。如果您傳遞的語言代碼未提供語言調整 (某些語言有意的,或由於實作不完整),此方法會傳回字串 'default'
,表示沒有特殊調整。
$Collator->locale_version
(自 Unicode::Collate::Locale 0.87 起) 傳回 locale 的版本號碼 (可能是 /\d\.\d\d/
),如同 Locale/*.pl。
注意: getlocale
和 locale_version
的回傳值組合應可識別出排序器使用的 Locale/*.pl。
locale name description
--------------------------------------------------------------
af Afrikaans
ar Arabic
as Assamese
az Azerbaijani (Azeri)
be Belarusian
bn Bengali
bs Bosnian (tailored as Croatian)
bs_Cyrl Bosnian in Cyrillic (tailored as Serbian)
ca Catalan
cs Czech
cu Church Slavic
cy Welsh
da Danish
de__phonebook German (umlaut as 'ae', 'oe', 'ue')
de_AT_phonebook Austrian German (umlaut primary greater)
dsb Lower Sorbian
ee Ewe
eo Esperanto
es Spanish
es__traditional Spanish ('ch' and 'll' as a grapheme)
et Estonian
fa Persian
fi Finnish (v and w are primary equal)
fi__phonebook Finnish (v and w as separate characters)
fil Filipino
fo Faroese
fr_CA Canadian French
gu Gujarati
ha Hausa
haw Hawaiian
he Hebrew
hi Hindi
hr Croatian
hu Hungarian
hy Armenian
ig Igbo
is Icelandic
ja Japanese [1]
kk Kazakh
kl Kalaallisut
kn Kannada
ko Korean [2]
kok Konkani
lkt Lakota
ln Lingala
lt Lithuanian
lv Latvian
mk Macedonian
ml Malayalam
mr Marathi
mt Maltese
nb Norwegian Bokmal
nn Norwegian Nynorsk
nso Northern Sotho
om Oromo
or Oriya
pa Punjabi
pl Polish
ro Romanian
sa Sanskrit
se Northern Sami
si Sinhala
si__dictionary Sinhala (U+0DA5 = U+0DA2,0DCA,0DA4)
sk Slovak
sl Slovenian
sq Albanian
sr Serbian
sr_Latn Serbian in Latin (tailored as Croatian)
sv Swedish (v and w are primary equal)
sv__reformed Swedish (v and w as separate characters)
ta Tamil
te Telugu
th Thai
tn Tswana
to Tonga
tr Turkish
ug_Cyrl Uyghur in Cyrillic
uk Ukrainian
ur Urdu
vi Vietnamese
vo Volapu"k
wae Walser
wo Wolof
yo Yoruba
zh Chinese
zh__big5han Chinese (ideographs: big5 order)
zh__gb2312han Chinese (ideographs: GB-2312 order)
zh__pinyin Chinese (ideographs: pinyin order) [3]
zh__stroke Chinese (ideographs: stroke order) [3]
zh__zhuyin Chinese (ideographs: zhuyin order) [3]
--------------------------------------------------------------
根據預設 UCA 規則的地區包括 am(阿姆哈拉語)不含 [reorder Ethi]
、bg(保加利亞語)不含 [reorder Cyrl]
、chr(切羅基語)不含 [reorder Cher]
、de(德語)、en(英語)、fr(法語)、ga(愛爾蘭語)、id(印尼語)、it(義大利語)、ka(喬治亞語)不含 [reorder Geor]
、mn(蒙古語)不含 [reorder Cyrl Mong]
、ms(馬來語)、nl(荷蘭語)、pt(葡萄牙語)、ru(俄語)不含 [reorder Cyrl]
、sw(史瓦希里語)、zu(祖魯語)。
注意
[1] ja:表意文字依 JIS X 0208 順序排序。全形和半形與一般形式相同。平假名和片假名之間的差異在第 4 層級,比較時也需要 (variable => 'Non-ignorable')
,然後 katakana_before_hiragana
就不會產生作用。
[2] ko:許多表意文字依其讀音排序。此類表意文字的主序(第 1 層級)等於對應的韓文字母,次序(第 2 層級)大於對應的韓文字母。
[3] zh__pinyin、zh__stroke 和 zh__zhuyin:實作 alt='short',其中調整了較少的表意文字。
variant code alias
------------------------------------------
dictionary dict
phonebook phone phonebk
reformed reform
traditional trad
------------------------------------------
big5han big5
gb2312han gb2312
pinyin
stroke
zhuyin
------------------------------------------
注意:'pinyin' 是拉丁語的漢語拼音,'zhuyin' 是注音符號的漢語拼音。
安裝 Unicode::Collate::Locale
需要 Collate/Locale.pm、Collate/Locale/*.pm、Collate/CJK/*.pm 和 Collate/allkeys.txt。在建置時,Unicode::Collate::Locale
不需要任何 data/*.txt、gendata/* 和 mklocale。Unicode::Collate::Locale
的測試命名為 t/loc_*.t。
即使某個字母已調整,其等效字母也不一定會像它一樣調整。例如,即使 W 已調整,全形 W(U+FF37
)、帶有銳音符號的 W(U+1E82
)等並未調整。結果可能取決於原始字串是否已正規化,以及是否已分解或合成。因此較不建議使用 (normalization => undef)
。
包括文字系統在內的任何群組順序都不會變更。
locale based CLDR or other reference
--------------------------------------------------------------------
af 30 = 1.8.1
ar 30 = 28 ("compat" wo [reorder Arab]) = 1.9.0
as 30 = 28 (without [reorder Beng..]) = 23
az 30 = 24 ("standard" wo [reorder Latn Cyrl])
be 30 = 28 (without [reorder Cyrl])
bn 30 = 28 ("standard" wo [reorder Beng..]) = 2.0.1
bs 30 = 28 (type="standard": [import hr])
bs_Cyrl 30 = 28 (type="standard": [import sr])
ca 30 = 23 (alt="proposed" type="standard")
cs 30 = 1.8.1 (type="standard")
cu 34 = 30 (without [reorder Cyrl])
cy 30 = 1.8.1
da 22.1 = 1.8.1 (type="standard")
de__phonebook 30 = 2.0 (type="phonebook")
de_AT_phonebook 30 = 27 (type="phonebook")
dsb 30 = 26
ee 30 = 21
eo 30 = 1.8.1
es 30 = 1.9.0 (type="standard")
es__traditional 30 = 1.8.1 (type="traditional")
et 30 = 26
fa 22.1 = 1.8.1
fi 22.1 = 1.8.1 (type="standard" alt="proposed")
fi__phonebook 22.1 = 1.8.1 (type="phonebook")
fil 30 = 1.9.0 (type="standard") = 1.8.1
fo 22.1 = 1.8.1 (alt="proposed" type="standard")
fr_CA 30 = 1.9.0
gu 30 = 28 ("standard" wo [reorder Gujr..]) = 1.9.0
ha 30 = 1.9.0
haw 30 = 24
he 30 = 28 (without [reorder Hebr]) = 23
hi 30 = 28 (without [reorder Deva..]) = 1.9.0
hr 30 = 28 ("standard" wo [reorder Latn Cyrl]) = 1.9.0
hu 22.1 = 1.8.1 (alt="proposed" type="standard")
hy 30 = 28 (without [reorder Armn]) = 1.8.1
ig 30 = 1.8.1
is 22.1 = 1.8.1 (type="standard")
ja 22.1 = 1.8.1 (type="standard")
kk 30 = 28 (without [reorder Cyrl])
kl 22.1 = 1.8.1 (type="standard")
kn 30 = 28 ("standard" wo [reorder Knda..]) = 1.9.0
ko 22.1 = 1.8.1 (type="standard")
kok 30 = 28 (without [reorder Deva..]) = 1.8.1
lkt 30 = 25
ln 30 = 2.0 (type="standard") = 1.8.1
lt 22.1 = 1.9.0
lv 22.1 = 1.9.0 (type="standard") = 1.8.1
mk 30 = 28 (without [reorder Cyrl])
ml 22.1 = 1.9.0
mr 30 = 28 (without [reorder Deva..]) = 1.8.1
mt 22.1 = 1.9.0
nb 22.1 = 2.0 (type="standard")
nn 22.1 = 2.0 (type="standard")
nso [*] 26 = 1.8.1
om 22.1 = 1.8.1
or 30 = 28 (without [reorder Orya..]) = 1.9.0
pa 22.1 = 1.8.1
pl 30 = 1.8.1
ro 30 = 1.9.0 (type="standard")
sa [*] 1.9.1 = 1.8.1 (type="standard" alt="proposed")
se 22.1 = 1.8.1 (type="standard")
si 30 = 28 ("standard" wo [reorder Sinh..]) = 1.9.0
si__dictionary 30 = 28 ("dictionary" wo [reorder Sinh..]) = 1.9.0
sk 22.1 = 1.9.0 (type="standard")
sl 22.1 = 1.8.1 (type="standard" alt="proposed")
sq 22.1 = 1.8.1 (alt="proposed" type="standard")
sr 30 = 28 (without [reorder Cyrl])
sr_Latn 30 = 28 (type="standard": [import hr])
sv 22.1 = 1.9.0 (type="standard")
sv__reformed 22.1 = 1.8.1 (type="reformed")
ta 22.1 = 1.9.0
te 30 = 28 (without [reorder Telu..]) = 1.9.0
th 22.1 = 22
tn [*] 26 = 1.8.1
to 22.1 = 22
tr 22.1 = 1.8.1 (type="standard")
uk 30 = 28 (without [reorder Cyrl])
ug_Cyrl https://en.wikipedia.org/wiki/Uyghur_Cyrillic_alphabet
ur 22.1 = 1.9.0
vi 22.1 = 1.8.1
vo 30 = 25
wae 30 = 2.0
wo [*] 1.9.1 = 1.8.1
yo 30 = 1.8.1
zh 22.1 = 1.8.1 (type="standard")
zh__big5han 22.1 = 1.8.1 (type="big5han")
zh__gb2312han 22.1 = 1.8.1 (type="gb2312han")
zh__pinyin 22.1 = 2.0 (type='pinyin' alt='short')
zh__stroke 22.1 = 1.9.1 (type='stroke' alt='short')
zh__zhuyin 22.1 = 22 (type='zhuyin' alt='short')
--------------------------------------------------------------------
[*] http://www.unicode.org/repos/cldr/tags/latest/seed/collation/
perl 的 Unicode::Collate::Locale 模組由 SADAHIRO Tomoyuki, <SADAHIRO@cpan.org> 編寫。此模組的著作權為 SADAHIRO Tomoyuki, Japan 所有,© 2004-2020。保留所有權利。
此模組為自由軟體;您可以在與 Perl 相同的條款下重新散布或修改它。