中文的正则 [u2E80-uFE4F]+
现在网络上流行的是以下两个: /^[u0391-uFFE5]+$/ /^[u4E00-u9FA5]+$/ 明显,第二个的范围比较小。经过测试,第二个是不对的,第二个范围外的 'u9FA6' 是汉字 "囗",所以第二个明显没有包含所有必需的。 第一个的最后一个字符 'uFFE5' 是 ‘¥’ 字符,而 'uFFE6' 是 '?' 字符。所以我认为第一个是大体对的,不过第一个开头 ‘u0391’ 是 ''Α",但是奇怪的是这个不是英文的半角A也不是中文的全角 A,奇怪。 所以我觉得第一个的范围可能稍微偏大,特别是开始段。 于是去查 utf8 编码表 原来汉字编码是比较奇特的,并不是编在一起,比如希伯来文U+0590 -- U+05FF这么方便。汉字被分成了很多小段,而且因为有很多汉字是中国、日本、韩国共享的,所以UTF8编码里面的CJK一般都是指汉字段。 经过审查,第一次出现CJK的是 U+2E80, 最后一次是U+FE4F。因此最终结论是: /^[u2E80-uFE4F]+$/ 最后,再贴一下utf8码表 U+0000 -- U+007F: Basic Latin U+0080 -- U+00FF: Latin-1 Supplement U+0100 -- U+017F: Latin Extended-A U+0180 -- U+024F: Latin Extended-B U+0250 -- U+02AF: IPA Extensions U+02B0 -- U+02FF: Spacing Modifier Letters U+0300 -- U+036F: Combining Diacritical Marks U+0370 -- U+03FF: Greek and Coptic U+0400 -- U+04FF: Cyrillic U+0500 -- U+052F: Cyrillic Supplement U+0530 -- U+058F: Armenian U+0590 -- U+05FF: Hebrew U+0600 -- U+06FF: Arabic U+0700 -- U+074F: Syriac U+0750 -- U+077F: Arabic Supplement U+0780 -- U+07BF: Thaana U+07C0 -- U+07FF: NKo U+0900 -- U+097F: Devanagari U+0980 -- U+09FF: Bengali U+0A00 -- U+0A7F: Gurmukhi U+0A80 -- U+0AFF: Gujarati U+0B00 -- U+0B7F: Oriya U+0B80 -- U+0BFF: Tamil U+0C00 -- U+0C7F: Telugu U+0C80 -- U+0CFF: Kannada U+0D00 -- U+0D7F: Malayalam U+0D80 -- U+0DFF: Sinhala U+0E00 -- U+0E7F: Thai U+0E80 -- U+0EFF: Lao U+0F00 -- U+0FFF: Tibetan U+1000 -- U+109F: Myanmar U+10A0 -- U+10FF: Georgian U+1100 -- U+11FF: Hangul Jamo U+1200 -- U+137F: Ethiopic U+1380 -- U+139F: Ethiopic Supplement U+13A0 -- U+13FF: Cherokee U+1400 -- U+167F: Unified Canadian Aboriginal Syllabics U+1680 -- U+169F: Ogham U+16A0 -- U+16FF: Runic U+1700 -- U+171F: Tagalog U+1720 -- U+173F: Hanunoo U+1740 -- U+175F: Buhid U+1760 -- U+177F: Tagbanwa U+1780 -- U+17FF: Khmer U+1800 -- U+18AF: Mongolian U+1900 -- U+194F: Limbu U+1950 -- U+197F: Tai Le U+1980 -- U+19DF: New Tai Lue U+19E0 -- U+19FF: Khmer Symbols U+1A00 -- U+1A1F: Buginese U+1B00 -- U+1B7F: Balinese U+1D00 -- U+1D7F: Phonetic Extensions U+1D80 -- U+1DBF: Phonetic Extensions Supplement U+1DC0 -- U+1DFF: Combining Diacritical Marks Supplement U+1E00 -- U+1EFF: Latin Extended Additional U+1F00 -- U+1FFF: Greek Extended U+2000 -- U+206F: General Punctuation U+2070 -- U+209F: Superscripts and Subscripts U+20A0 -- U+20CF: Currency Symbols U+20D0 -- U+20FF: Combining Diacritical Marks for Symbols U+2100 -- U+214F: Letterlike Symbols U+2150 -- U+218F: Number Forms U+2190 -- U+21FF: Arrows U+2200 -- U+22FF: Mathematical Operators U+2300 -- U+23FF: Miscellaneous Technical U+2400 -- U+243F: Control Pictures U+2440 -- U+245F: Optical Character Recognition U+2460 -- U+24FF: Enclosed Alphanumerics U+2500 -- U+257F: Box Drawing U+2580 -- U+259F: Block Elements U+25A0 -- U+25FF: Geometric Shapes U+2600 -- U+26FF: Miscellaneous Symbols U+2700 -- U+27BF: Dingbats U+27C0 -- U+27EF: Miscellaneous Mathematical Symbols-A U+27F0 -- U+27FF: Supplemental Arrows-A U+2800 -- U+28FF: Braille Patterns U+2900 -- U+297F: Supplemental Arrows-B U+2980 -- U+29FF: Miscellaneous Mathematical Symbols-B U+2A00 -- U+2AFF: Supplemental Mathematical Operators U+2B00 -- U+2BFF: Miscellaneous Symbols and Arrows U+2C00 -- U+2C5F: Glagolitic U+2C60 -- U+2C7F: Latin Extended-C U+2C80 -- U+2CFF: Coptic U+2D00 -- U+2D2F: Georgian Supplement U+2D30 -- U+2D7F: Tifinagh U+2D80 -- U+2DDF: Ethiopic Extended U+2E00 -- U+2E7F: Supplemental Punctuation U+2E80 -- U+2EFF: CJK Radicals Supplement U+2F00 -- U+2FDF: Kangxi Radicals U+2FF0 -- U+2FFF: Ideographic Description Characters U+3000 -- U+303F: CJK Symbols and Punctuation U+3040 -- U+309F: Hiragana U+30A0 -- U+30FF: Katakana U+3100 -- U+312F: Bopomofo U+3130 -- U+318F: Hangul Compatibility Jamo U+3190 -- U+319F: Kanbun U+31A0 -- U+31BF: Bopomofo Extended U+31C0 -- U+31EF: CJK Strokes U+31F0 -- U+31FF: Katakana Phonetic Extensions U+3200 -- U+32FF: Enclosed CJK Letters and Months U+3300 -- U+33FF: CJK Compatibility U+3400 -- U+4DBF: CJK Unified Ideographs Extension A U+4DC0 -- U+4DFF: Yijing Hexagram Symbols U+4E00 -- U+9FFF: CJK Unified Ideographs U+A000 -- U+A48F: Yi Syllables U+A490 -- U+A4CF: Yi Radicals U+A700 -- U+A71F: Modifier Tone Letters U+A720 -- U+A7FF: Latin Extended-D U+A800 -- U+A82F: Syloti Nagri U+A840 -- U+A87F: Phags-pa U+AC00 -- U+D7AF: Hangul Syllables U+D800 -- U+DB7F: High Surrogates U+DB80 -- U+DBFF: High Private Use Surrogates U+DC00 -- U+DFFF: Low Surrogates U+E000 -- U+F8FF: Private Use Area U+F900 -- U+FAFF: CJK Compatibility Ideographs U+FB00 -- U+FB4F: Alphabetic Presentation Forms U+FB50 -- U+FDFF: Arabic Presentation Forms-A U+FE00 -- U+FE0F: Variation Selectors U+FE10 -- U+FE1F: Vertical Forms U+FE20 -- U+FE2F: Combining Half Marks U+FE30 -- U+FE4F: CJK Compatibility Forms U+FE50 -- U+FE6F: Small Form Variants U+FE70 -- U+FEFF: Arabic Presentation Forms-B U+FF00 -- U+FFEF: Halfwidth and Fullwidth Forms U+FFF0 -- U+FFFF: Specials (编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|