Mailing List ArchiveSupport open source code!
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]tlug: Re: BTW, what is a "BMPstring"?
- To: tlug@example.com
- Subject: tlug: Re: BTW, what is a "BMPstring"?
- From: "Stephen J. Turnbull" <turnbull@example.com>
- Date: Tue, 21 Sep 1999 16:40:57 +0900 (JST)
- Content-Transfer-Encoding: 7bit
- Content-Type: text/plain; charset=us-ascii
- In-Reply-To: <37E72925.9BF18AC7@example.com>
- References: <37E72925.9BF18AC7@example.com>
- Reply-To: tlug@example.com
- Sender: owner-tlug@example.com
>>>>> "Sanjay" == Sanjay Agnani <s.agnani@example.com> writes: Sanjay> I think BMP (Basic Multilingual Plane) string is basically Sanjay> a Unicode (Universal Coded Character Set-2 -> UCS-2) Sanjay> string in 16-bit encoding in native processor endianness. Well, in that case you need to translate to JIS (either EUC or ISO-2022-7 compatible, depending on your console's capabilities) first. This will have to be table-driven, although on Sloaris you may get lucky and have a system utility call for that. No such luck on Linux, not until glibc 2.2 and possibly later IIRC. The tables are available for download at ftp.unicode.org. You'll probably want to be defensive about people using UTF-16 surrogates (non-Unihan Japanese kanji will be up there; people will want to use their proper name and address characters). You may want to strip out private space characters. One alternative in both cases is to use the geta mark (looks like a fat equals sign; the JIS X 0208 equivalent of U+FFFD) as a substitute. You may also want to strip out/substitute everything that doesn't code directly to JIS. I'm not sure what happens with JIS Greek, Cyrillic, etc, be careful there. (I think that since these don't violate the source separation rule, they get unified. But you will want to reverse translate them to JIS, and I don't know if the Unicode tables do that by default.) Oh, and forget about "printf". \0 is a valid (and extremely common) byte in Unicode (every ISO-8859-1 character has that in the upper byte, right?) You'll need wprintf() and friends, which I don't know if they work in glibc 2.1, and are implemented idiosyncratically in glibc 2.2. (Ie, you'll probably need to have several levels of #ifdefs, one for each libc---several flavors of glibc, as glibc developers don't care if they break your programs, plus at least one for Sloaris.) Be very careful to keep the use of widechar output functions extremely localized; use inline functions or macros if efficiency is important. - University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091 __________________________________________________________________________ __________________________________________________________________________ What are those two straight lines for? "Free software rules." ------------------------------------------------------------------- Next Nomikai: September 17 (Fri), 19:30 Tengu TokyoEkiMae 03-3275-3691 *** Linux 8th Birthday Anniversary! *** Next Technical Meeting: October 9 (Sat), 13:00 place: Temple Univ. *** Topics: 1) Linux i18n 2) Japanese TrueType fonts ------------------------------------------------------------------- more info: http://www.tlug.gr.jp Sponsor: Global Online Japan
- References:
- tlug: Re: BTW, what is a "BMPstring"?
- From: Sanjay Agnani <s.agnani@example.com>
Home | Main Index | Thread Index
- Prev by Date: Re: tlug: ISDN Access
- Next by Date: Re: tlug: ISDN Access
- Prev by thread: tlug: Re: BTW, what is a "BMPstring"?
- Next by thread: tlug: Re: Li18nux
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links