Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]Re: [tlug] OT-Japanese in PHP
- Date: Mon, 23 May 2005 11:31:15 +0900
- From: Yoshihiro Sato <y_satou@example.com>
- Subject: Re: [tlug] OT-Japanese in PHP
- References: <200505220201.j4M21ZnW002503@example.com>
- Organization: Amazon.co.jp
- User-agent: Wanderlust/2.12.0 (Your Wildest Dreams) SEMI/1.14.6 (Maruoka)FLIM/1.14.7 (Sanjō) APEL/10.6 Emacs/21.3(i386-redhat-linux-gnu) MULE/5.0 (SAKAKI)
On server's side, especially if it's web application, I recommend to handle data like this: reject all characters which are not in JISX0208, and reject all half-width katakana. The difficulties of handling Japanese is 1. Shift_JIS is not same as CP932 / Windows-31J / Shift_JIS on Macintosh. 2. There's many mapping table between Unicode and legacy Japanese charsets 3. Unicode CJK characters are unified. 1. Shift_JIS is not same as CP932. Shift_JIS is originally not charset, it's rule how to "shift" JIS X 0208. But Microsoft's Windows31-J (aka CP932) is having some additional characters in extentional area: like circled numbers, roman numeric characters, symbols, etc. But on the other hand, Macintosh (Mac OS) is assigned different characters on the same data. For example, Windows (CP932) circled-number-one is displaying as (日) (in one double-width char) on Mac. You can find unmatched character list on Shift_JIS between Windows and Macintosh: http://www.notoinsatu.co.jp/font/omake/S-JIS_check.pdf The problem is, Windows PC can enter these characters on the form of web browser. On web server's side, really difficult to detect which character is entered on user's side. Maybe need to check OS and browser version properly - but it won't promise always we can get correct result. 2. There's many mapping table between Unicode and legacy Japanese charsets For example, even if it's in Microsoft world, you can find there's difference between Shift_JIS -> Unicode and CP932 -> Unicode: http://www.asahi-net.or.jp/~ez3k-msym/charsets/jis2ucs.htm Mapping table is different between each processing engine - typically library. There's several libraries (like iconv, etc.) for converting legacy charset <--> Unicode, and typically it has differencies. 3. Unicode CJK characters are unified. This issue is typically happened when entering people's name and/or location name. When Unicode is designed, some characters which looks "similar" are unified into 1 characters (which is in area of "CJK Unified Ideographs"), and additionals are put into area of "CJK Compatibility Ideographs." This also makes mapping issue - mapping tables are simply comberted characters into "CJK Unified Ideographs" characters, and not using "CJK Compatibility Ideographs" characters. Even if end user has method to input correct character on their UI in legacy character set, but there's a case it's mapped to different character on server's side. But actual problem is, most of the case end user does not have proper way to input such special characters. And users input "simplified character" or "similar character" as compromised solution when they meet restriction. -- Yoshihiro Satou y_satou@example.com On Sun, 22 May 2005 12:01:35 +1000 (EST), Jim Breen <Jim.Breen@example.com> said: > > Evan Monroig <evan.ubuntu@example.com> wrote: >>> The >>> generally accepted idea is that since Shift_JIS was created by >>> Japanese people for Japanese people, then it handles the Japanese >>> language better than UTF-8, which is not true (^_^) > > I'll say it's not true. Shit_JIS was created by Microsoft, as Ken Lunde > wrote in his UJIP book 12 years ago. > > Jim > > -- > Jim Breen http://www.csse.monash.edu.au/~jwb/ > Computer Science & Software Engineering, Tel: +61 3 9905 9554 > Monash University, VIC 3800, Australia Fax: +61 3 9905 5146 > (Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学
- Follow-Ups:
- Re: [tlug] OT-Japanese in PHP
- From: Stephen J. Turnbull
- References:
- Re: [tlug-digest] Re: [tlug] OT-Japanese in PHP
- From: Jim Breen
Home | Main Index | Thread Index
- Prev by Date: RE: [tlug] Job Hunting
- Next by Date: Re: [tlug] Job Hunting
- Previous by thread: Re: [tlug-digest] Re: [tlug] OT-Japanese in PHP
- Next by thread: Re: [tlug] OT-Japanese in PHP
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links