加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

重拾VB6(32):Issues Specific to the Double-Byte Character

发布时间:2020-12-16 23:28:32 所属栏目:大数据 来源:网络整理
导读:来自MSDN-2001-OCT: Visual Tools and Languages/Visual Studio 6.0 Documentation/Visual Basic Documentation/Using Visual Basic/Programmer’s Guide/Part 2: What Can You Do With Visual Basic/ International Issues / 1. The concept of DBCS (1) Th

来自MSDN-2001-OCT: Visual Tools and Languages/Visual Studio 6.0 Documentation/Visual Basic Documentation/Using Visual Basic/Programmer’s Guide/Part 2: What Can You Do With Visual Basic/International Issues/

1. The concept of DBCS

(1) The double-byte character set (DBCS) was created to handle East Asian languages that use ideographic characters,which require more than the 256 characters supported by ANSI. Characters in DBCS are addressed using a 16-bit notation,using 2 bytes. With 16-bit notation you can represent 65,536 characters,although far fewer characters are defined for the East Asian languages.

(2) In locales where DBCS is used — including China,Japan,and Korea — both single-byte and double-byte characters are included in the character set. The single-byte characters used in these locales conform to the 8-bit national standards for each country and correspond closely to the ASCII character set. Certain ranges of codes in these single-byte character sets (SBCS) are designated as lead bytes for DBCS characters. A consecutive pair made of a lead byte(前导字节) and a trail byte(后继字节) represents one double-byte character. The code range used for the lead byte depends on the locale.

(3) DBCS is a different character set from Unicode. Because Visual Basic represents all strings internally in Unicode format,both ANSI characters and DBCS characters are converted to Unicode and Unicode characters are converted to ANSI characters or DBCS characters automatically whenever the conversion is needed. You can also convert between Unicode and ANSI/DBCS characters manually.

2. ANSI,DBCS,and Unicode: Definitions

(1) Because the ANSI standard uses only a single byte to represent each character,it is limited to a maximum of 256 character and punctuation codes. Although this is adequate for English,it doesn't fully support many other languages.

(2) DBCS is used in Microsoft Windows systems that are distributed in most parts of Asia. It provides support for many different East Asian language alphabets,such as Chinese,Japanese,and Korean. DBCS uses the numbers 0 – 128 to represent the ASCII character set. Some numbers greater than 128 function as lead-byte characters,which are not really characters but simply indicators that the next value is a character from a non-Latin character set. In DBCS,ASCII characters are only 1 byte in length,whereas Japanese,Korean,and other East Asian characters are 2 bytes in length.

(3) Unicode is a character-encoding scheme that uses 2 bytes for every character.

The International Standards Organization (ISO) defines a number in the range of 0 to 65,535 (216 – 1) for just about every character and symbol in every language (plus some empty spaces for future growth).

On all 32-bit versions of Windows,Unicode is used by the Component Object Model (COM), the basis for OLE and ActiveX technologies. Unicode is fully supported by Windows NT. Although both Unicode and DBCS have double-byte characters,the encoding schemes are completely different.

(4) DBCS Sort Order and String Comparison: 如果选Option Compare Text statement,comparisons are made according to the case-insensitive textual sort order determined by the user's system locale. 那在中文里可能have two representations for the same character: a narrow-width letter and a wide-width letter,而它们会被视为相同的.

3. DBCS String Manipulation Functions

(1) Although a double-byte character consists of a lead byte and a trail byte and requires two consecutive storage bytes,it must be treated as a single unit in any operation involving characters and strings.

(2) The "B" versions of the functions in the following table are intended especially for use with strings of binary data. The "W" versions are intended for use with Unicode strings.

(3) The functions without a "B" or "W" in this table correctly handle DBCS and ANSI characters. In addition to the functions above,the String function handles DBCS characters. This means that all these functions consider a DBCS character as one character even if that character consists of 2 bytes.

In locales using DBCS,the number of characters and the number of bytes are not necessarily the same. Mid would only return the number of characters,not bytes.

(4) In most cases,use the character-based functions when you handle string data because these functions can properly handle ANSI strings,DBCS strings,and Unicode strings.

When you store the characters to a String variable or get the characters from a String variable,Visual Basic automatically converts between Unicode and ANSI characters. When you handle the binary data,use the Byte array instead of the String variable and the byte-based string manipulation functions.

(5) Visual Basic provides several string conversion functions that are useful for DBCS characters: StrConv,UCase,and LCase.

For example,you can convert narrow letters to wide letters by specifying vbWide in the second argument of StrConv.

You can also use the StrConv function to convert Unicode characters to ANSI/DBCS characters,and vice versa.

4. Font,Display,and Print Considerations in a DBCS Environment

(1) When you use a font designed only for SBCS characters,DBCS characters may not be displayed correctly in the DBCS version of Windows.

字体和字体大小都需要调整。Usually,the text in your application will be displayed best in a 9-point font on most East Asian platforms,whereas an 8-point font is typical on European platforms.

These considerations apply to printing DBCS characters with your application as well.

(2) How to Avoid Changing Font Settings:一个是用Font Association,which automatically maps any English fonts in your application to a Korean font(中文也支持,日文不支持).

Another option is to use the System or FixedSys font.

(3) 也可以编程根据用户机上的locale自适应地选择字体。

5. Processing Files That Use Double-Byte Characters

(1) In locales where DBCS is used,a file may include both double-byte and single-byte characters. Because a DBCS character is represented by two bytes,your Visual Basic code must avoid splitting it.

(2) When you read a fixed length of bytes from a binary file,use a Byte array instead of a String variable to prevent the ANSI-to-Unicode conversion in Visual Basic.

(3) When you use a String variable with Input or InputB to read bytes from a binary file,Unicode conversion occurs and the result is incorrect.

(4) Keep in mind that the names of files and directories may also include DBCS characters.

6. others

(1) DBCS characters are not supported in any of the following identifiers: Public procedure names,Public variables,Public constants,Project name,Class names

(2) The KeyPress event can process a double-byte character code as one event. The higher byte of the keyascii argument represents the lead byte of a double-byte character,and the lower byte represents the trail byte.

(3) Many Windows API and DLL functions return size in bytes. This return value represents the size of the returned string. Visual Basic converts the returned string into Unicode even though the return value still represents the size of the ANSI or DBCS string. Therefore,you may not be able to use this returned size as the string's size.

7. Visual Basic Bidirectional Features

(1) bidirectional refers to the product ability to manipulate and display text for both left-to-right and right-to-left languages.

(2) Although RightToLeft is a part of every Microsoft Visual Basic installation,it is operational only when Microsoft Visual Basic is installed in a bidirectional 32-bit Microsoft Windows environment. 比如Arabic系统。

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读