Re: tlug: A couple of questions about Unicode

To: tlug@example.com
Subject: Re: tlug: A couple of questions about Unicode
From: Gaspar Sinai <gsinai@example.com>
Date: Mon, 12 Jan 1998 00:27:26 +0900 (JST)
Content-Type: TEXT/PLAIN; charset=US-ASCII
In-Reply-To: <199801091717.CAA03920@example.com>
Reply-To: tlug@example.com
Sender: owner-tlug@example.com

Hi,
I feel compelled to contribute to this thread. So here are my thoughts: 

o It is very unfortunate that practical standard for unicode becomes 
  16 bit(UCS2) instead of 32 bit (UCS4). There is no 7 bit transformation 
  format for UCS4 (the 8-bit format is UTF8). I think the people involved
  in the standard were influenced too much by NT and they had to make very
  Microsoft-ish hacks:
  o there is a codespace where two 16-bit characters are used to map a 
    portion of the UCS4 space into UCS2.
  o if you want to process  some Indian or Arabic scripts you need to
    combine two 16-bit unicode character to form a single glyph.
  I think linux only gains if it uses utf8 instead of ucs2.

o When you compare the advantages of sharing codes between
  Japanese,Chinese characters I think there are more advantages than 
  disadvantages. The disadvantages go away when you are allowed to change
  font in the document.

o Unicode is not consistent to the rules it set to itself. You would 
  expect that the wide ASCII characters would have the ASCII values just
  like wide Cyrillic or Greek but this is not the case. For some strange 
  reason they kept the wide ASCII.

o I know that there are some people in Japan who do not like Unicode. 
  I bash unicode - still I like it. And Japan is very lucky when it
  comes to Unicode (Tamil and Malayanan scripts come to my mind...)

o Someone mentioned inconsistency with SJIS. IMHO SJIS is not sufficient
  and should not be used. It can not encode a lot of characters that 
  JIS and EUC can. (Yes I know that SJIS is the standard format in Win95.)

o The NT unicode format is a simple dump of UCS2 with a magic U+FEFF code
  at the beginning. This code is used to determine endiannnes.

So much for now. Sorry for the short-ish style.

BTW: I have released yudit-0.95 yesterday. Now it compiles with egcs and
it fixes some bugs. It supports NT notepad format. You can get it from
sunsite or:

        http://www2.gol.com/users/gsinai/yudit-0.95..tar.gz

cheers,
gaspar

---------------------------------------------------------------
Next TLUG Nomikai: 14 January 1998 19:15  Tokyo station
Yaesu Chuo ticket gate.  Or go directly to Tengu TokyoEkiMae 19:30
Chuo-ku, Kyobashi 1-1-6, EchiZenYa Bld. B1/B2 03-3275-3691
Next Saturday Meeting: 14 February 1998 12:30 Tokyo Station
Yaesu Chuo ticket gate.
---------------------------------------------------------------
a word from the sponsor:
TWICS - Japan's First Public-Access Internet System
www.twics.com  info@example.com  Tel:03-3351-5977  Fax:03-3353-6096

Follow-Ups:
- Re: tlug: A couple of questions about Unicode
  - From: "J. David Beutel" <jdb@example.com>
- UTF-8 [was: Re: tlug: A couple of questions about Unicode]
  - From: "Stephen J. Turnbull" <turnbull@example.com>

References:
- tlug: A couple of questions about Unicode
  - From: "Jonathan Byrne" <jbyrne@example.com>

Prev by Date: Re: tlug: A couple of questions about Unicode
Next by Date: Re: tlug: various stuff -> Nomikai Administrator
Prev by thread: Re: tlug: A couple of questions about Unicode
Next by thread: Re: tlug: A couple of questions about Unicode
Index(es):
- Date
- Thread

Home | Main Index | Thread Index